Solved

Regular Expression for stripping out invalid CSS tags

Posted on 2009-04-09
10
312 Views
Last Modified: 2012-05-06
I'm trying to strip out the invalid CSS tags in regular expression. Since new CSS have [, ], |, ~, >, * and +, I'm wonder how to strip them out.

Could you please share if you have a better method? Thank you for your help...
Public Shared Function ValidateCSS(ByVal original As String) As string 
 

        Dim result As String = String.Empty

        Dim ctr As Long

        Dim sChar As String
 

        Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
 

        original = Trim(original)
 

        For ctr = 1 To Len(original)

            sChar = Mid(original, ctr, 1)

            If Regex.IsMatch(sChar, pattern) Then

                result = result & sChar

            End If

        Next
 

        Return result

    End Function

Open in new window

0
Comment
Question by:winmyan
  • 6
  • 3
10 Comments
 
LVL 16

Expert Comment

by:t0t0
ID: 24120067
Looking at your code, you could do something like this:

Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "[]|~>*+"
      
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function



NOTE:

This would have worked too however, it's more efficient to evaluate MID() function once rather than twice per iteration of the loop.

      If Instr(pattern, Mid(original, ctr, 1)) = 0 Then
         result = result & sChar
      End If


0
 
LVL 16

Expert Comment

by:t0t0
ID: 24136195
winmyan

Do you require further assistance?
0
 

Author Comment

by:winmyan
ID: 24138140
Hi t0t0,

Thank you for your help. According to your comment, I will end up writing all possible characters in pattern string.

I'm really looking for regular expression.  In addition, I am wondering how others have done that kind of text stripping to prevent SQL injection (I could not find any sample code in google).

Thank you again for your comment!!!
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24138268
I think I may have misled you with the variable-name 'pattern'.

The variable 'pattern' contains characters which you DO NOT want to appear in your result string - this is opposite to the way you previously used 'pattern'.

The lines of code:

      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End if

checks each character in your original string and if it does not appear in pattern (the exclude characters) then it appends the character to the result string.

You can add whatever characters to 'pattern' string.

So, rather than testing for valid characters to include, you're simply doing the opposite - testing for non-valid characters to exclude.

Did you not understand that?
0
 

Author Comment

by:winmyan
ID: 24138742
Hi t0t0,

Thank you for your quick response.

Do I have to include all possible characters in pattern string which to be stripped out? If it is the case, the pattern string will have few hundreds characters (at least).

I only want the characters I need rather than finding out all possible characters. Please correct me, if I am wrong.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 16

Expert Comment

by:t0t0
ID: 24139368
As far as i am aware, there are only about 220 printable characters and about 32 non-printable characters. In all, there are 256 ASCII characters representable by a single byte.

74 of those characters comprise of a-z, A-Z and 0-9.

There are only about 96 printable characters available on an ordinary QWERTY keyboard.

So, you really have to decide how much of the printable and non-printable characters you can safely assume are NOT going to appear in your CCS tag lines before deciding on what to exclude.

However, you have decided to narrow down your INCLUDE options so i'll work to that then.

Please try the follwing reverse code then:


Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
     
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) > 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function


NOTE: I have no idea what a lot of this is ("^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$") but I have included it anyway. All I know is that it represents the characters to include in your result string.

0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 24145994
> Do I have to include all possible characters in pattern string which to be stripped out?
This would be the blacklisting aproach, you always go better with an whitelisting solution where your regex defines all allowed characters which avoids missing some unwanted characters.
Use whitelisting.
0
 
LVL 16

Accepted Solution

by:
t0t0 earned 500 total points
ID: 24146665
This IS whitelisting.

Pattern contains only characters that SHOULD appear in the output string.

It's up to winmyan to edit the pattern string so that it includes only those characters which may occur in his output string.

You'll notice here, I'm not using Regex - there's no need to. The trouble is, we become so reliant on high level functions we forget how to program ourselves. Just looking at the complex example strings and functions of Regex makes me wonder why someone would want to try and memorise so much small detail when a standard function such as InStr(), Mid(), Len(), Left() etc... provides all the tools to accomplish the same functions albeit programatically. The most obvious advantage of using Regex is probably reduced code and arguably, less errors however, Regex should not be a replacement for good design and coding skills.

winmyan - could you be specific about the characters that MAY appear in the output string so that this can be coded up for you. So far I recognise the following:

   abcdefghijklmnopqrstuvwxyz
   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   0123456789
   . # - _ : @




0
 

Author Closing Comment

by:winmyan
ID: 31570447
Thank you so much for your help!!!
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24148249
Oh, that was suprprisingly unexpected. Any way, I hope you have arrived at a solution. Thank you.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, an…
This article provides a brief introduction to tissue engineering, the process by which organs can be grown artificially. It covers the problems with organ transplants, the tissue engineering process, and the current successes and problems of the tec…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now