Solved

vb.net regular expression with $ at beginning of string sought

Posted on 2009-03-28
8
710 Views
Last Modified: 2012-05-06
I'm using escaped characters to allow for special characters.

For example this works

Regex.IsMatch("d+ollar", "^\bd\+ollar\b") = true
Regex.IsMatch("d$ollar", "^\bd\$ollar\b") = true
Regex.IsMatch("d{ollar", "^\bd\{ollar\b") = true

But when the special character is at the start or end of the string it does not work. For example:

Regex.IsMatch("+dollar", "^\b\+dollar\b") = false
Regex.IsMatch("$dollar", "^\b\$dollar\b") = false
Regex.IsMatch("{dollar", "^\b\{dollar\b") = false

Regex.IsMatch("dollar+", "^\bdollar\+\b") = false
Regex.IsMatch("dollar$", "^\bdollar\$\b") = false
Regex.IsMatch("dollar{", "^\bdollar\{\b") = false

How can I use special charaters at the at the start and end of the string?

Thanks,
Glenn
0
Comment
Question by:glenn_r
  • 4
  • 4
8 Comments
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 24013047
Hi glenn_r;

Regex.IsMatch("+dollar", "^\b\+dollar\b") = false

The \b defines a word boundary for example, [a-zA-Z_0-9][^a-zA-Z_0-9] would be a word boundary or [^a-zA-Z_0-9][a-zA-Z_0-9] would be another word boundary. The \b before the + in the input string fails the test, in other words \b+ fails the test because the start of the string and the + is not a word boundary, removing the \b at the begining of the regex pattern will correct the issue.

The same thing holds true for the other two items.

The last three issues is the same as the first three but this time removing the last \b will correct the issue for the same reason.

Fernando
0
 

Author Comment

by:glenn_r
ID: 24025370
Fernando

I tried your solution and you are correct. Before I award and accept the solition I need you to clarify the reasoning behind this logic.

From my understanding wrapping the search string "dollar" in \b means - The "\b" is a special code that means, "match the position at the beginning or end of any word". This expression will only match complete words spelled "dollar" with any combination of lower case or capital letters. Example "\bdollar\b"

I have some instances where I need to find the word dollar with +, $, [, (, (, etc. AKA special characters. From what I read to use the literal value prefix the special character with a backslash. So if I wanted to find the word "+dollar" the regex should be "\b\+dollar\b". Why is this so? Please explain.

Thanks
Glenn


0
 

Author Comment

by:glenn_r
ID: 24025404
Fernando,

I did more testing. The following does not work

Match the word "dollar+"
?Regex.IsMatch("dollar+", "^\bdollar\+") = TRUE
The issue is that it will also return true for
dollar++
dollar+anycharatersaftertheplus

i want to find the exact word literal values +dollar, dollar+, $dollar, dollar$. I want to remove the special meaning of the characters because the user might type in a word with a special character so I prefix the character with a backslash escape character to use its literal value.

Thanks
Glenn
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 24025786
Hi Glenn;

In a regular expression \b denotes that the match must occur on a word boundary, between \w and \W character.

Where \w is any alphanumeric character and the underscore character, [a-zA-Z_0-9]
Where \W is all other non- alphanumeric characters, [^a-zA-Z_0-9].

In this case that you give in your post ID: 24025370 you state the following, "So if I wanted to find the word "+dollar" the regex should be "\b\+dollar\b". Why is this so? Please explain.", Well in fact that is not the case for this reason, a \b to be matched in a pattern it must have a alphanumeric to the left or right of it and a non-alphanumeric on the other side of it otherwise it fails. for example if this is the input string, " +dollar" and you have the regex pattern of "\b\+dollar\b" the \b is between the space character and the + sign, a whitespace character is not an alphanumeric and neither is the + sign and fails the definition. Now if the input string were this, "X+dollar" where X is any alphanumeric or underscore character then the input string would match and pass.

In your post ID: 24025404 you state this, "Regex.IsMatch("dollar+", "^\bdollar\+") = TRUE The issue is that it will also return true for dollar++", well that is correct the pattern is that you start matching at the beginning of the string and the first pattern character is the \b, the character to the left of \b is a non-character and for the \b to match in that position it has to find a alphanumeric character next which in this case is the letter d so at this point we have a match. Then the next 6 characters it must be ollar+ as per the pattern which it is. At this point you have run out of pattern characters and so regex last known state was that it found one complete match and returns true as it should.

Two things 1- will there be other words around the "+dollar, dollar+, $dollar, dollar$. " or will it be the only word in the string? and 2- what version of Visual Studio are you using? Because there may an easy way to have the regex engine escape the user input string.

Fernando
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:glenn_r
ID: 24030667
using vs vb.net 2005

My application:

I have a list of file names.

list of files
------------------------
myfile.doc
xfile.doc
test1.xls
dollar$.txt
+dollar.xxx
dollar
dollar2

I have a textbox where the user can specify text used to filter the list. Example, user wants to see all the files that (start) with the string dollar. They'd type in dollar*. The * wildcard meas anything after the * would return

dollar$.txt
dollar
dollar2

Note that the user does not type in regex strings. I convert the search string to a regex for matching

Note that all my logic works without regexe special characters. as some of the special characters can be use in file names I have to allow for them.
0
 
LVL 62

Accepted Solution

by:
Fernando Soto earned 50 total points
ID: 24032362
Hi Glenn;

The function Regex.Escape will properly escape the user input for any characters that the Regex engine uses. The following statement will automatically escape the Regex meta-characters and place the result in the variable pattern.

Dim pattern As String = Regex.Escape(TextBox1.Text)

In setting up the Regex pattern which you will add to the variable pattern use the ^ for the start of the string and the $ for the end of the string. Also I would not use the \b meta-character in this scenario.

To your statement, "Note that all my logic works without regexe special characters. as some of the special characters can be use in file names I have to allow for them.", The Regex.Escape will take care of this.

Fernando
0
 

Author Comment

by:glenn_r
ID: 24040229
thanks for the help
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 24040494
Not a problem, glad I was able to help.  ;=)
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now