Regular Expression for stripping out invalid CSS tags

I'm trying to strip out the invalid CSS tags in regular expression. Since new CSS have [, ], |, ~, >, * and +, I'm wonder how to strip them out.

Could you please share if you have a better method? Thank you for your help...
Public Shared Function ValidateCSS(ByVal original As String) As string 
 
        Dim result As String = String.Empty
        Dim ctr As Long
        Dim sChar As String
 
        Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
 
        original = Trim(original)
 
        For ctr = 1 To Len(original)
            sChar = Mid(original, ctr, 1)
            If Regex.IsMatch(sChar, pattern) Then
                result = result & sChar
            End If
        Next
 
        Return result
    End Function

Open in new window

winmyanAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

t0t0Commented:
Looking at your code, you could do something like this:

Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "[]|~>*+"
      
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function



NOTE:

This would have worked too however, it's more efficient to evaluate MID() function once rather than twice per iteration of the loop.

      If Instr(pattern, Mid(original, ctr, 1)) = 0 Then
         result = result & sChar
      End If


0
t0t0Commented:
winmyan

Do you require further assistance?
0
winmyanAuthor Commented:
Hi t0t0,

Thank you for your help. According to your comment, I will end up writing all possible characters in pattern string.

I'm really looking for regular expression.  In addition, I am wondering how others have done that kind of text stripping to prevent SQL injection (I could not find any sample code in google).

Thank you again for your comment!!!
0
OWASP: Forgery and Phishing

Learn the techniques to avoid forgery and phishing attacks and the types of attacks an application or network may face.

t0t0Commented:
I think I may have misled you with the variable-name 'pattern'.

The variable 'pattern' contains characters which you DO NOT want to appear in your result string - this is opposite to the way you previously used 'pattern'.

The lines of code:

      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End if

checks each character in your original string and if it does not appear in pattern (the exclude characters) then it appends the character to the result string.

You can add whatever characters to 'pattern' string.

So, rather than testing for valid characters to include, you're simply doing the opposite - testing for non-valid characters to exclude.

Did you not understand that?
0
winmyanAuthor Commented:
Hi t0t0,

Thank you for your quick response.

Do I have to include all possible characters in pattern string which to be stripped out? If it is the case, the pattern string will have few hundreds characters (at least).

I only want the characters I need rather than finding out all possible characters. Please correct me, if I am wrong.
0
t0t0Commented:
As far as i am aware, there are only about 220 printable characters and about 32 non-printable characters. In all, there are 256 ASCII characters representable by a single byte.

74 of those characters comprise of a-z, A-Z and 0-9.

There are only about 96 printable characters available on an ordinary QWERTY keyboard.

So, you really have to decide how much of the printable and non-printable characters you can safely assume are NOT going to appear in your CCS tag lines before deciding on what to exclude.

However, you have decided to narrow down your INCLUDE options so i'll work to that then.

Please try the follwing reverse code then:


Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
     
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) > 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function


NOTE: I have no idea what a lot of this is ("^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$") but I have included it anyway. All I know is that it represents the characters to include in your result string.

0
ahoffmannCommented:
> Do I have to include all possible characters in pattern string which to be stripped out?
This would be the blacklisting aproach, you always go better with an whitelisting solution where your regex defines all allowed characters which avoids missing some unwanted characters.
Use whitelisting.
0
t0t0Commented:
This IS whitelisting.

Pattern contains only characters that SHOULD appear in the output string.

It's up to winmyan to edit the pattern string so that it includes only those characters which may occur in his output string.

You'll notice here, I'm not using Regex - there's no need to. The trouble is, we become so reliant on high level functions we forget how to program ourselves. Just looking at the complex example strings and functions of Regex makes me wonder why someone would want to try and memorise so much small detail when a standard function such as InStr(), Mid(), Len(), Left() etc... provides all the tools to accomplish the same functions albeit programatically. The most obvious advantage of using Regex is probably reduced code and arguably, less errors however, Regex should not be a replacement for good design and coding skills.

winmyan - could you be specific about the characters that MAY appear in the output string so that this can be coded up for you. So far I recognise the following:

   abcdefghijklmnopqrstuvwxyz
   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   0123456789
   . # - _ : @




0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
winmyanAuthor Commented:
Thank you so much for your help!!!
0
t0t0Commented:
Oh, that was suprprisingly unexpected. Any way, I hope you have arrived at a solution. Thank you.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.