Friday, September 2, 2011

Prevent Cross Site Scripting

1. HTML and JavaScript
http://www.codeproject.com/KB/web-security/Security_HTML_Injection.aspx

2. PHP: Preventing typical XSS attacks
http://chriscook.me/web-development/php-preventing-typical-xss-attacks/

3.  15 PHP regular expressions for web developers
http://www.catswhocode.com/blog/15-php-regular-expressions-for-web-developers

4. XSS (Cross Site Scripting) Prevention Cheat Sheet
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#Why_Can.27t_I_Just_HTML_Entity_Encode_Untrusted_Data.3F

5. PHP Regular Expression
http://php-regex.blogspot.com/2008/01/introduction-to-regular-expressions-in.html

6. Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php

7. Regular Expression Basic Syntax Reference
http://www.regular-expressions.info/reference.html

8. Using a Regular Expression to Match HTML
http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx

9 Ultimate Regular Expression for HTML tag parsing with PHP
http://kevin.deldycke.com/2007/03/ultimate-regular-expression-for-html-tag-parsing-with-php/


Literal Text:
- The characters that match themselves are called literals

Metacharacter:
  • backslash  \  :
  • caret  ^  :  at the beginning of a regular expression indicates that it must match the beginning of the string
  • dollar sign  $ : match strings that end with the given pattern.
  • period or dot  .  : matches any single character except newline (\). e.g. the pattern h.t matches hat, hothit, hut, h7t, etc
  • vertical bar or pipe symbol  |  : is used for alternatives in a regular expression.
  • question mark  ?   : 
  • asterisk or star  *  :
  • plus sign  +  :
  • square bracket  [   ]  :
  • round bracket  (  )  :
  • brace  {   } :

If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.

[agk]    matches any one a, g, or k
[a-z]    matches any one character from a to z
[^z]     matches any character other than z
[\\(\\)] matches ( or ) (in javascript, the escape slash must be escaped!)

.        any character except \n
\w       any word character, same as [a-zA-Z0-9_]
\W       any non-word character
\s       any whitespace character, same as [ \t\n\r\f\v]
\S       any non-whitespace character
\d       any digit
\D       any non-digit

\/       literal /
\\       literal \
\.       literal .
\*       literal *
\+       literal +
\?       literal ?
\|       literal |
\(       literal (
\)       literal )
\[       literal [
\]       literal ]

\-       the - must be escaped inside brackets: [a-z0-9 _.\-\?!]

{n,m}    match previous item n to m times
{n,}     match previous item n or more times
{n}      match exactly n times
?        match zero or once, same as {0,1}, also makes + and * "lazy"
+        match one or more
*        match zero or more

|        or
(x|y)    match x or y, inclusive (all x and y will be replaced)
( )      grouping and reference
\1       reference to first grouping, used in the expression
$1       reference to first grouping, used in the replacement string
$$       literal $ used in the replacement string

^        anchor to the beginning of the string
$        anchor to the end of the string
\b       match a word boundary (does not include the boundary)
\B       match a non word boundary (does not include the boundary) 

q(?=u)   match q only before u (does not match the u)
q(?!u)   match q except before u 

i        case-insensitive search, used like /expression/i
g        global replacement, used like /expression/g 

No comments:

Post a Comment