Please help decipher regular expression.

Please be as detailed as possible.  
rex.Pattern = "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

A developer who left the company wrote a page with this in it and didn't comment.  I'd like to place a comment above this so anyone else can read what it's looking for.

Doug
Doug
Also, is there a reverse regex program anywhere?
Gertone (Geert Bormans)
Hi skipper68,
> "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

^ = line-start
.* = zero or more '.', which is any character, so this is zero or more of any character
| = or sign, alternation
() = grouping parentheses
%3 = content of a variable ?, likely something you picked up before
[Cc] = character class, 'C' or 'c'
[a-zA-Z] = character class, any lower case or upper case alfabetic character
$ = line-end

so this is a single line,
filled with any characters up to a ';' or a sequence (< or content of variable3) followed by a C or c
all that followed by one alfabetic character and again a bunch of any character up to line end

I am only not sure about the %3, that is a language dependent thing... is it Perl?

Gertone (Geert Bormans)
> Also, is there a reverse regex program anywhere?

not that I am aware of... what would it reverse to in your mind?
Doug
I've been doing a little research on my own and I think I'm even more confused now...

Before I read your post, I though it was looking like this.

Starting with a dot

Matching any criteria between the parentheses, separated by the pipe (|) symbol

No semicolons

No Less than signs

No 3 capital letters in a row (ie. AAA)

ending with a dot

Containing a DollarSign

If any of these holds true, the expression returns the count of the number of violations.

Can someone confirm or clarify?
Gertone (Geert Bormans)
what is the programming language?

in most languages
^ is a positional pattern for start of sentence
. is any character
meaning that a dot needs to be escaped like this \.

you have negations everywhere in your explanations
I don't see any negation sign in the regex
so I think you are far off... unless this is a weird prog language I have never done regex in...

Doug
It's in an asp page set up like this.

I've been talking to a few people here and it's supposed to be a request.querystring and request.form check for specific characters.

Set rex = New regexp

rex.Pattern = "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

Set colMatches = rex.Execute(Request.QueryString)

If colMatches.Count > 0 Then
      Response.Write "A potentially dangerous Request.QueryString value was detected from the client."
End If

Set colMatches = rex.Execute(Request.Form)

If colMatches.Count > 0 Then
      Response.Write "A potentially dangerous Request.Form value was detected from the client."
End If
It looks to me like it is checking for strings such as:

    hello ; there


    hello < there


    hello %3C there

which it considers "potentially dangerous"

I guess due the chances that it could be an SQL or Javascript injection attempt...


I second what TimYates said :)

Doug
Accepted prematurely....

rex.Pattern = "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

1. Starts with a dot
2. Cannot contain semi-colon, less than sign, or %3

Does the [a-zA-Z] mean that it can only contain lowercase and uppercase letters?  Numbers work

What does the dot at the end mean?

What does the *$ mean?
Gertone (Geert Bormans)
.* means any character zero or more times
$ means end of line or end of inputstream
[a-zA-Z] means a or b or c or d.... or z or A or B or C or... or Z (it is a character class)
So it means

  <any chars>  

followed by



  < or %3C or %3c followed by any letter ( eg:  <A or %3cA or %3CB )

followed by

  <any chars>  

Doug
Wonderful.  Thank you to all.
