Please help decipher regular expression.

Please be as detailed as possible.  
rex.Pattern = "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

A developer who left the company wrote a page with this in it and didn't comment.  I'd like to place a comment above this so anyone else can read what it's looking for.


Thanks.
LVL 9
skipper68Application Development ManagerAsked:
Who is Participating?
 
TimYatesCommented:
It looks to me like it is checking for strings such as:

    hello ; there

or

    hello < there

or

    hello %3C there

which it considers "potentially dangerous"

I guess due the chances that it could be an SQL or Javascript injection attempt...

Tim
0
 
skipper68Application Development ManagerAuthor Commented:
Also, is there a reverse regex program anywhere?
0
 
Geert BormansInformation ArchitectCommented:
Hi skipper68,
> "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

^ = line-start
.* = zero or more '.', which is any character, so this is zero or more of any character
| = or sign, alternation
() = grouping parentheses
%3 = content of a variable ?, likely something you picked up before
[Cc] = character class, 'C' or 'c'
[a-zA-Z] = character class, any lower case or upper case alfabetic character
$ = line-end


so this is a single line,
filled with any characters up to a ';' or a sequence (< or content of variable3) followed by a C or c
all that followed by one alfabetic character and again a bunch of any character up to line end

I am only not sure about the %3, that is a language dependent thing... is it Perl?

Cheers!
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Geert BormansInformation ArchitectCommented:
skipper68,
> Also, is there a reverse regex program anywhere?

not that I am aware of... what would it reverse to in your mind?
0
 
skipper68Application Development ManagerAuthor Commented:
I've been doing a little research on my own and I think I'm even more confused now...

Before I read your post, I though it was looking like this.

^.
Starting with a dot

*(.....)
Matching any criteria between the parentheses, separated by the pipe (|) symbol

;
No semicolons

<
No Less than signs

%3[Cc])[a-zA-Z])
No 3 capital letters in a row (ie. AAA)

.
ending with a dot

*$
Containing a DollarSign

If any of these holds true, the expression returns the count of the number of violations.

Can someone confirm or clarify?
0
 
Geert BormansInformation ArchitectCommented:
what is the programming language?

in most languages
^ is a positional pattern for start of sentence
. is any character
meaning that a dot needs to be escaped like this \.

you have negations everywhere in your explanations
I don't see any negation sign in the regex
so I think you are far off... unless this is a weird prog language I have never done regex in...

cheers
0
 
skipper68Application Development ManagerAuthor Commented:
It's in an asp page set up like this.

I've been talking to a few people here and it's supposed to be a request.querystring and request.form check for specific characters.


Set rex = New regexp

rex.Pattern = "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

Set colMatches = rex.Execute(Request.QueryString)

If colMatches.Count > 0 Then
      Response.Write "A potentially dangerous Request.QueryString value was detected from the client."
      Response.End
End If

Set colMatches = rex.Execute(Request.Form)

If colMatches.Count > 0 Then
      Response.Write "A potentially dangerous Request.Form value was detected from the client."
      Response.End
End If
0
 
ysreCommented:
I second what TimYates said :)

Ys
0
 
skipper68Application Development ManagerAuthor Commented:
Accepted prematurely....

rex.Pattern = "^.*(;|(<|%3[Cc])[a-zA-Z]).*$"

1. Starts with a dot
2. Cannot contain semi-colon, less than sign, or %3

Does the [a-zA-Z] mean that it can only contain lowercase and uppercase letters?  Numbers work

What does the dot at the end mean?

What does the *$ mean?
0
 
Geert BormansInformation ArchitectCommented:
.* means any character zero or more times
$ means end of line or end of inputstream
[a-zA-Z] means a or b or c or d.... or z or A or B or C or... or Z (it is a character class)
0
 
TimYatesCommented:
So it means

  <any chars>  

followed by

  ;

OR

  < or %3C or %3c followed by any letter ( eg:  <A or %3cA or %3CB )

followed by

  <any chars>  

Tim
0
 
skipper68Application Development ManagerAuthor Commented:
Wonderful.  Thank you to all.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.