http://www.codeproject.com/KB/web-security/Security_HTML_Injection.aspx
2. PHP: Preventing typical XSS attacks
http://chriscook.me/web-development/php-preventing-typical-xss-attacks/
3. 15 PHP regular expressions for web developers
http://www.catswhocode.com/blog/15-php-regular-expressions-for-web-developers
4. XSS (Cross Site Scripting) Prevention Cheat Sheet
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#Why_Can.27t_I_Just_HTML_Entity_Encode_Untrusted_Data.3F
5. PHP Regular Expression
http://php-regex.blogspot.com/2008/01/introduction-to-regular-expressions-in.html
6. Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
7. Regular Expression Basic Syntax Reference
http://www.regular-expressions.info/reference.html
8. Using a Regular Expression to Match HTML
http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx
9 Ultimate Regular Expression for HTML tag parsing with PHP
http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/
Literal Text:
- The characters that match themselves are called literals
Metacharacter:
- backslash \ :
- caret ^ : at the beginning of a regular expression indicates that it must match the beginning of the string
- dollar sign $ : match strings that end with the given pattern.
- period or dot . : matches any single character except newline (\). e.g. the pattern h.t matches hat, hothit, hut, h7t, etc
- vertical bar or pipe symbol | : is used for alternatives in a regular expression.
- question mark ? :
- asterisk or star * :
- plus sign + :
- square bracket [ ] :
- round bracket ( ) :
- brace { } :
If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.
[agk] matches any one a, g, or k [a-z] matches any one character from a to z [^z] matches any character other than z [\\(\\)] matches ( or ) (in javascript, the escape slash must be escaped!) . any character except \n \w any word character, same as [a-zA-Z0-9_] \W any non-word character \s any whitespace character, same as [ \t\n\r\f\v] \S any non-whitespace character \d any digit \D any non-digit \/ literal / \\ literal \ \. literal . \* literal * \+ literal + \? literal ? \| literal | \( literal ( \) literal ) \[ literal [ \] literal ] \- the - must be escaped inside brackets: [a-z0-9 _.\-\?!] {n,m} match previous item n to m times {n,} match previous item n or more times {n} match exactly n times ? match zero or once, same as {0,1}, also makes + and * "lazy" + match one or more * match zero or more | or (x|y) match x or y, inclusive (all x and y will be replaced) ( ) grouping and reference \1 reference to first grouping, used in the expression $1 reference to first grouping, used in the replacement string $$ literal $ used in the replacement string ^ anchor to the beginning of the string $ anchor to the end of the string \b match a word boundary (does not include the boundary) \B match a non word boundary (does not include the boundary) q(?=u) match q only before u (does not match the u) q(?!u) match q except before u i case-insensitive search, used like /expression/i g global replacement, used like /expression/g
No comments:
Post a Comment