Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Regular Expression for stripping out invalid CSS tags

Posted on 2009-04-09
10
Medium Priority
?
342 Views
Last Modified: 2012-05-06
I'm trying to strip out the invalid CSS tags in regular expression. Since new CSS have [, ], |, ~, >, * and +, I'm wonder how to strip them out.

Could you please share if you have a better method? Thank you for your help...
Public Shared Function ValidateCSS(ByVal original As String) As string 
 
        Dim result As String = String.Empty
        Dim ctr As Long
        Dim sChar As String
 
        Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
 
        original = Trim(original)
 
        For ctr = 1 To Len(original)
            sChar = Mid(original, ctr, 1)
            If Regex.IsMatch(sChar, pattern) Then
                result = result & sChar
            End If
        Next
 
        Return result
    End Function

Open in new window

0
Comment
Question by:winmyan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
10 Comments
 
LVL 16

Expert Comment

by:t0t0
ID: 24120067
Looking at your code, you could do something like this:

Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "[]|~>*+"
      
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function



NOTE:

This would have worked too however, it's more efficient to evaluate MID() function once rather than twice per iteration of the loop.

      If Instr(pattern, Mid(original, ctr, 1)) = 0 Then
         result = result & sChar
      End If


0
 
LVL 16

Expert Comment

by:t0t0
ID: 24136195
winmyan

Do you require further assistance?
0
 

Author Comment

by:winmyan
ID: 24138140
Hi t0t0,

Thank you for your help. According to your comment, I will end up writing all possible characters in pattern string.

I'm really looking for regular expression.  In addition, I am wondering how others have done that kind of text stripping to prevent SQL injection (I could not find any sample code in google).

Thank you again for your comment!!!
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 16

Expert Comment

by:t0t0
ID: 24138268
I think I may have misled you with the variable-name 'pattern'.

The variable 'pattern' contains characters which you DO NOT want to appear in your result string - this is opposite to the way you previously used 'pattern'.

The lines of code:

      If Instr(pattern, sChar) = 0 then
         result = result & sChar
      End if

checks each character in your original string and if it does not appear in pattern (the exclude characters) then it appends the character to the result string.

You can add whatever characters to 'pattern' string.

So, rather than testing for valid characters to include, you're simply doing the opposite - testing for non-valid characters to exclude.

Did you not understand that?
0
 

Author Comment

by:winmyan
ID: 24138742
Hi t0t0,

Thank you for your quick response.

Do I have to include all possible characters in pattern string which to be stripped out? If it is the case, the pattern string will have few hundreds characters (at least).

I only want the characters I need rather than finding out all possible characters. Please correct me, if I am wrong.
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24139368
As far as i am aware, there are only about 220 printable characters and about 32 non-printable characters. In all, there are 256 ASCII characters representable by a single byte.

74 of those characters comprise of a-z, A-Z and 0-9.

There are only about 96 printable characters available on an ordinary QWERTY keyboard.

So, you really have to decide how much of the printable and non-printable characters you can safely assume are NOT going to appear in your CCS tag lines before deciding on what to exclude.

However, you have decided to narrow down your INCLUDE options so i'll work to that then.

Please try the follwing reverse code then:


Public Shared Function ValidateCSS(ByVal original As String) As string

   Dim result As String = String.Empty
   Dim ctr As Integer
   Dim sChar As String

   Dim pattern As String = "^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$"
     
   original = Trim(original)

   For ctr = 1 To Len(original)
      sChar = Mid(original, ctr, 1)
      If Instr(pattern, sChar) > 0 then
         result = result & sChar
      End If
   Next

   Return result
End Function


NOTE: I have no idea what a lot of this is ("^[\t]*[a-zA-Z0-9\.# -_:@]+[\t]*\{.*[\t]*$") but I have included it anyway. All I know is that it represents the characters to include in your result string.

0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 24145994
> Do I have to include all possible characters in pattern string which to be stripped out?
This would be the blacklisting aproach, you always go better with an whitelisting solution where your regex defines all allowed characters which avoids missing some unwanted characters.
Use whitelisting.
0
 
LVL 16

Accepted Solution

by:
t0t0 earned 1500 total points
ID: 24146665
This IS whitelisting.

Pattern contains only characters that SHOULD appear in the output string.

It's up to winmyan to edit the pattern string so that it includes only those characters which may occur in his output string.

You'll notice here, I'm not using Regex - there's no need to. The trouble is, we become so reliant on high level functions we forget how to program ourselves. Just looking at the complex example strings and functions of Regex makes me wonder why someone would want to try and memorise so much small detail when a standard function such as InStr(), Mid(), Len(), Left() etc... provides all the tools to accomplish the same functions albeit programatically. The most obvious advantage of using Regex is probably reduced code and arguably, less errors however, Regex should not be a replacement for good design and coding skills.

winmyan - could you be specific about the characters that MAY appear in the output string so that this can be coded up for you. So far I recognise the following:

   abcdefghijklmnopqrstuvwxyz
   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   0123456789
   . # - _ : @




0
 

Author Closing Comment

by:winmyan
ID: 31570447
Thank you so much for your help!!!
0
 
LVL 16

Expert Comment

by:t0t0
ID: 24148249
Oh, that was suprprisingly unexpected. Any way, I hope you have arrived at a solution. Thank you.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article provides a brief introduction to tissue engineering, the process by which organs can be grown artificially. It covers the problems with organ transplants, the tissue engineering process, and the current successes and problems of the tec…
We are witnesses that everyone is saying that our children shouldn't "play" with a technology because it is dangerous. This article is going to prove that they are wrong.
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question