Solved

Regular Expression for stripping out invalid CSS tags

Posted on 2009-04-09
10
332 Views
Last Modified: 2012-05-06
I'm trying to strip out the invalid CSS tags in regular expression. Since new CSS have [, ], |, ~, >, * and +, I'm wonder how to strip them out.

Could you please share if you have a better method? Thank you for your help...
Public Shared Function ValidateCSS(ByVal original As String) As string 
 
        Dim result As String = String.Empty
        Dim ctr As Long
        Dim sChar As String
 
        Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
 
        original = Trim(original)
 
        For ctr = 1 To Len(original)
            sChar = Mid(original, ctr, 1)
            If Regex.IsMatch(sChar, pattern) Then
                result = result & sChar
            End If
        Next
 
        Return result
    End Function

Open in new window

0
Comment
Question by:winmyan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
10 Comments
 
LVL 16

Expert Comment

by:t0t0
ID: 24120067
Looking at your code, you could do something like this:

Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "[]|~>*+"
      
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function



NOTE:

This would have worked too however, it's more efficient to evaluate MID() function once rather than twice per iteration of the loop.

      If Instr(pattern, Mid(original, ctr, 1)) = 0 Then
         result = result & sChar
      End If


0
 
LVL 16

Expert Comment

by:t0t0
ID: 24136195
winmyan

Do you require further assistance?
0
 

Author Comment

by:winmyan
ID: 24138140
Hi t0t0,

Thank you for your help. According to your comment, I will end up writing all possible characters in pattern string.

I'm really looking for regular expression.  In addition, I am wondering how others have done that kind of text stripping to prevent SQL injection (I could not find any sample code in google).

Thank you again for your comment!!!
0
[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

 
LVL 16

Expert Comment

by:t0t0
ID: 24138268
I think I may have misled you with the variable-name 'pattern'.

The variable 'pattern' contains characters which you DO NOT want to appear in your result string - this is opposite to the way you previously used 'pattern'.

The lines of code:

      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End if

checks each character in your original string and if it does not appear in pattern (the exclude characters) then it appends the character to the result string.

You can add whatever characters to 'pattern' string.

So, rather than testing for valid characters to include, you're simply doing the opposite - testing for non-valid characters to exclude.

Did you not understand that?
0
 

Author Comment

by:winmyan
ID: 24138742
Hi t0t0,

Thank you for your quick response.

Do I have to include all possible characters in pattern string which to be stripped out? If it is the case, the pattern string will have few hundreds characters (at least).

I only want the characters I need rather than finding out all possible characters. Please correct me, if I am wrong.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24139368
As far as i am aware, there are only about 220 printable characters and about 32 non-printable characters. In all, there are 256 ASCII characters representable by a single byte.

74 of those characters comprise of a-z, A-Z and 0-9.

There are only about 96 printable characters available on an ordinary QWERTY keyboard.

So, you really have to decide how much of the printable and non-printable characters you can safely assume are NOT going to appear in your CCS tag lines before deciding on what to exclude.

However, you have decided to narrow down your INCLUDE options so i'll work to that then.

Please try the follwing reverse code then:


Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
     
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) > 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function


NOTE: I have no idea what a lot of this is ("^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$") but I have included it anyway. All I know is that it represents the characters to include in your result string.

0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 24145994
> Do I have to include all possible characters in pattern string which to be stripped out?
This would be the blacklisting aproach, you always go better with an whitelisting solution where your regex defines all allowed characters which avoids missing some unwanted characters.
Use whitelisting.
0
 
LVL 16

Accepted Solution

by:
t0t0 earned 500 total points
ID: 24146665
This IS whitelisting.

Pattern contains only characters that SHOULD appear in the output string.

It's up to winmyan to edit the pattern string so that it includes only those characters which may occur in his output string.

You'll notice here, I'm not using Regex - there's no need to. The trouble is, we become so reliant on high level functions we forget how to program ourselves. Just looking at the complex example strings and functions of Regex makes me wonder why someone would want to try and memorise so much small detail when a standard function such as InStr(), Mid(), Len(), Left() etc... provides all the tools to accomplish the same functions albeit programatically. The most obvious advantage of using Regex is probably reduced code and arguably, less errors however, Regex should not be a replacement for good design and coding skills.

winmyan - could you be specific about the characters that MAY appear in the output string so that this can be coded up for you. So far I recognise the following:

   abcdefghijklmnopqrstuvwxyz
   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   0123456789
   . # - _ : @




0
 

Author Closing Comment

by:winmyan
ID: 31570447
Thank you so much for your help!!!
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24148249
Oh, that was suprprisingly unexpected. Any way, I hope you have arrived at a solution. Thank you.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
How to Win a Jar of Candy Corn: A Scientific Approach! I love mathematics. If you love mathematics also, you may enjoy this tip on how to use math to win your own jar of candy corn and to impress your friends. As I said, I love math, but I gu…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…
Suggested Courses

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question