Link to home
Start Free TrialLog in
Avatar of glenn_r
glenn_r

asked on

vb.net regular expression with $ at beginning of string sought

I'm using escaped characters to allow for special characters.

For example this works

Regex.IsMatch("d+ollar", "^\bd\+ollar\b") = true
Regex.IsMatch("d$ollar", "^\bd\$ollar\b") = true
Regex.IsMatch("d{ollar", "^\bd\{ollar\b") = true

But when the special character is at the start or end of the string it does not work. For example:

Regex.IsMatch("+dollar", "^\b\+dollar\b") = false
Regex.IsMatch("$dollar", "^\b\$dollar\b") = false
Regex.IsMatch("{dollar", "^\b\{dollar\b") = false

Regex.IsMatch("dollar+", "^\bdollar\+\b") = false
Regex.IsMatch("dollar$", "^\bdollar\$\b") = false
Regex.IsMatch("dollar{", "^\bdollar\{\b") = false

How can I use special charaters at the at the start and end of the string?

Thanks,
Glenn
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Hi glenn_r;

Regex.IsMatch("+dollar", "^\b\+dollar\b") = false

The \b defines a word boundary for example, [a-zA-Z_0-9][^a-zA-Z_0-9] would be a word boundary or [^a-zA-Z_0-9][a-zA-Z_0-9] would be another word boundary. The \b before the + in the input string fails the test, in other words \b+ fails the test because the start of the string and the + is not a word boundary, removing the \b at the begining of the regex pattern will correct the issue.

The same thing holds true for the other two items.

The last three issues is the same as the first three but this time removing the last \b will correct the issue for the same reason.

Fernando
Avatar of glenn_r
glenn_r

ASKER

Fernando

I tried your solution and you are correct. Before I award and accept the solition I need you to clarify the reasoning behind this logic.

From my understanding wrapping the search string "dollar" in \b means - The "\b" is a special code that means, "match the position at the beginning or end of any word". This expression will only match complete words spelled "dollar" with any combination of lower case or capital letters. Example "\bdollar\b"

I have some instances where I need to find the word dollar with +, $, [, (, (, etc. AKA special characters. From what I read to use the literal value prefix the special character with a backslash. So if I wanted to find the word "+dollar" the regex should be "\b\+dollar\b". Why is this so? Please explain.

Thanks
Glenn


Avatar of glenn_r

ASKER

Fernando,

I did more testing. The following does not work

Match the word "dollar+"
?Regex.IsMatch("dollar+", "^\bdollar\+") = TRUE
The issue is that it will also return true for
dollar++
dollar+anycharatersaftertheplus

i want to find the exact word literal values +dollar, dollar+, $dollar, dollar$. I want to remove the special meaning of the characters because the user might type in a word with a special character so I prefix the character with a backslash escape character to use its literal value.

Thanks
Glenn
Hi Glenn;

In a regular expression \b denotes that the match must occur on a word boundary, between \w and \W character.

Where \w is any alphanumeric character and the underscore character, [a-zA-Z_0-9]
Where \W is all other non- alphanumeric characters, [^a-zA-Z_0-9].

In this case that you give in your post ID: 24025370 you state the following, "So if I wanted to find the word "+dollar" the regex should be "\b\+dollar\b". Why is this so? Please explain.", Well in fact that is not the case for this reason, a \b to be matched in a pattern it must have a alphanumeric to the left or right of it and a non-alphanumeric on the other side of it otherwise it fails. for example if this is the input string, " +dollar" and you have the regex pattern of "\b\+dollar\b" the \b is between the space character and the + sign, a whitespace character is not an alphanumeric and neither is the + sign and fails the definition. Now if the input string were this, "X+dollar" where X is any alphanumeric or underscore character then the input string would match and pass.

In your post ID: 24025404 you state this, "Regex.IsMatch("dollar+", "^\bdollar\+") = TRUE The issue is that it will also return true for dollar++", well that is correct the pattern is that you start matching at the beginning of the string and the first pattern character is the \b, the character to the left of \b is a non-character and for the \b to match in that position it has to find a alphanumeric character next which in this case is the letter d so at this point we have a match. Then the next 6 characters it must be ollar+ as per the pattern which it is. At this point you have run out of pattern characters and so regex last known state was that it found one complete match and returns true as it should.

Two things 1- will there be other words around the "+dollar, dollar+, $dollar, dollar$. " or will it be the only word in the string? and 2- what version of Visual Studio are you using? Because there may an easy way to have the regex engine escape the user input string.

Fernando
Avatar of glenn_r

ASKER

using vs vb.net 2005

My application:

I have a list of file names.

list of files
------------------------
myfile.doc
xfile.doc
test1.xls
dollar$.txt
+dollar.xxx
dollar
dollar2

I have a textbox where the user can specify text used to filter the list. Example, user wants to see all the files that (start) with the string dollar. They'd type in dollar*. The * wildcard meas anything after the * would return

dollar$.txt
dollar
dollar2

Note that the user does not type in regex strings. I convert the search string to a regex for matching

Note that all my logic works without regexe special characters. as some of the special characters can be use in file names I have to allow for them.
ASKER CERTIFIED SOLUTION
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of glenn_r

ASKER

thanks for the help
Not a problem, glad I was able to help.  ;=)