Solved

Regular Expression for stripping out invalid CSS tags

Posted on 2009-04-09
10
311 Views
Last Modified: 2012-05-06
I'm trying to strip out the invalid CSS tags in regular expression. Since new CSS have [, ], |, ~, >, * and +, I'm wonder how to strip them out.

Could you please share if you have a better method? Thank you for your help...
Public Shared Function ValidateCSS(ByVal original As String) As string 
 

        Dim result As String = String.Empty

        Dim ctr As Long

        Dim sChar As String
 

        Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
 

        original = Trim(original)
 

        For ctr = 1 To Len(original)

            sChar = Mid(original, ctr, 1)

            If Regex.IsMatch(sChar, pattern) Then

                result = result & sChar

            End If

        Next
 

        Return result

    End Function

Open in new window

0
Comment
Question by:winmyan
  • 6
  • 3
10 Comments
 
LVL 16

Expert Comment

by:t0t0
ID: 24120067
Looking at your code, you could do something like this:

Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "[]|~>*+"
      
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function



NOTE:

This would have worked too however, it's more efficient to evaluate MID() function once rather than twice per iteration of the loop.

      If Instr(pattern, Mid(original, ctr, 1)) = 0 Then
         result = result & sChar
      End If


0
 
LVL 16

Expert Comment

by:t0t0
ID: 24136195
winmyan

Do you require further assistance?
0
 

Author Comment

by:winmyan
ID: 24138140
Hi t0t0,

Thank you for your help. According to your comment, I will end up writing all possible characters in pattern string.

I'm really looking for regular expression.  In addition, I am wondering how others have done that kind of text stripping to prevent SQL injection (I could not find any sample code in google).

Thank you again for your comment!!!
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24138268
I think I may have misled you with the variable-name 'pattern'.

The variable 'pattern' contains characters which you DO NOT want to appear in your result string - this is opposite to the way you previously used 'pattern'.

The lines of code:

      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End if

checks each character in your original string and if it does not appear in pattern (the exclude characters) then it appends the character to the result string.

You can add whatever characters to 'pattern' string.

So, rather than testing for valid characters to include, you're simply doing the opposite - testing for non-valid characters to exclude.

Did you not understand that?
0
 

Author Comment

by:winmyan
ID: 24138742
Hi t0t0,

Thank you for your quick response.

Do I have to include all possible characters in pattern string which to be stripped out? If it is the case, the pattern string will have few hundreds characters (at least).

I only want the characters I need rather than finding out all possible characters. Please correct me, if I am wrong.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 16

Expert Comment

by:t0t0
ID: 24139368
As far as i am aware, there are only about 220 printable characters and about 32 non-printable characters. In all, there are 256 ASCII characters representable by a single byte.

74 of those characters comprise of a-z, A-Z and 0-9.

There are only about 96 printable characters available on an ordinary QWERTY keyboard.

So, you really have to decide how much of the printable and non-printable characters you can safely assume are NOT going to appear in your CCS tag lines before deciding on what to exclude.

However, you have decided to narrow down your INCLUDE options so i'll work to that then.

Please try the follwing reverse code then:


Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
     
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) > 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function


NOTE: I have no idea what a lot of this is ("^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$") but I have included it anyway. All I know is that it represents the characters to include in your result string.

0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 24145994
> Do I have to include all possible characters in pattern string which to be stripped out?
This would be the blacklisting aproach, you always go better with an whitelisting solution where your regex defines all allowed characters which avoids missing some unwanted characters.
Use whitelisting.
0
 
LVL 16

Accepted Solution

by:
t0t0 earned 500 total points
ID: 24146665
This IS whitelisting.

Pattern contains only characters that SHOULD appear in the output string.

It's up to winmyan to edit the pattern string so that it includes only those characters which may occur in his output string.

You'll notice here, I'm not using Regex - there's no need to. The trouble is, we become so reliant on high level functions we forget how to program ourselves. Just looking at the complex example strings and functions of Regex makes me wonder why someone would want to try and memorise so much small detail when a standard function such as InStr(), Mid(), Len(), Left() etc... provides all the tools to accomplish the same functions albeit programatically. The most obvious advantage of using Regex is probably reduced code and arguably, less errors however, Regex should not be a replacement for good design and coding skills.

winmyan - could you be specific about the characters that MAY appear in the output string so that this can be coded up for you. So far I recognise the following:

   abcdefghijklmnopqrstuvwxyz
   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   0123456789
   . # - _ : @




0
 

Author Closing Comment

by:winmyan
ID: 31570447
Thank you so much for your help!!!
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24148249
Oh, that was suprprisingly unexpected. Any way, I hope you have arrived at a solution. Thank you.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now