?
Solved

regular expressions - only part of what I am “looking for”

Posted on 2011-04-27
13
Medium Priority
?
330 Views
Last Modified: 2012-05-11
I am trying to write some code using regular expressions code that will find what I am “looking for” within some “input”.  The catch is that the “input” can contain only part of what I am “looking for”.  Also, what I am “looking for” can contain special characters but I am suppressing them with backslash, got this part working.

What do I need to add to the “pattern” to accomplish this?
 
string input = "This document is the [ABC ";
            string lookingfor = "[ABC:Name]";

            //suppress the special characters within the value
            string pattern = lookingfor.Replace(":", "\\:"); 
            pattern = pattern.Replace("[", "\\[");
            pattern = pattern.Replace("]", "\\]");

            MatchCollection list = Regex.Matches(input, "^[" + pattern + "]", RegexOptions.IgnoreCase);
            if (list.Count > 0) Console.WriteLine("Found It");

Open in new window

0
Comment
Question by:tampsystems
  • 5
  • 4
  • 4
13 Comments
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35475758
1) The colon and closing square bracket don't need to be escaped (http://msdn.microsoft.com/en-us/library/4edbef7e.aspx).

2) Your input contains "Blah blah [ABC" but your pattern is specifically looking for "[ABC:Word]" - that is not a match.  If you want your pattern to match the string "input" in your example, it needs to be shortened - "\[[A-Z]{3}\b" will match any occurence of an opening square bracket followed by three upper-case letters, followed by a word boundary.

So "This document is the [ABC " matches.

"This document is the [ABCD " does not match (too many letters).

http://msdn.microsoft.com/en-us/library/az24scfc.aspx
0
 

Author Comment

by:tampsystems
ID: 35475846
The problem is that the user can define what I am “looking for”.  I am trying to take that and add some regular expressions to it so I can find a match.  The input can be a number of things.  The toughest one is that it can contain only part of what I am “looking for” i.e. “This document is the [ABC “

Can this be done?  Maybe regular expression is the wrong choice for this scenario.  
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35475948
I think you will need to more narrowly define your criteria.  "The quick Brown fox jumped over the lazy dog" has a B, which is part of [ABC, so should that match?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:tampsystems
ID: 35475998
The requirement is that I need to be able to match the following scenario.
Input:      “The quick Brown fox jump”
Pattern:         “jumped”

0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 1400 total points
ID: 35476052
I agree with tgerbert: regex looks for an exact match. However, you can try this and see if it suits you.

static string GetModifiedPattern(string pattern)
{
    System.Text.StringBuilder result = new System.Text.StringBuilder(pattern);

    for (int i = pattern.Length - 1; i > 0; i--)
    {
        result.AppendFormat("@{0}", pattern.Substring(0, i));
    }

    pattern = Regex.Escape(result.ToString());

    return pattern.Replace('@', '|');
}

Open in new window


Usage
string input = "This document is the [ABC ";
string lookingfor = "[ABC:Name]";

//suppress the special characters within the value
string pattern = GetModifiedPattern(lookingfor);

MatchCollection list = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);

if (list.Count > 0)
{
    Console.WriteLine("Found It");
    Console.WriteLine(list[0].Value);
}

Console.ReadKey();

Open in new window

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35476073
Take heed, though, I used @ as a "special" delimiter. If you original pattern contains an @ symbol, then you'll need to choose some other unique character to act as that delimiter. However, you'll want to stay away from any special regex characters as the call to Regex.Escape() will wreak havoc on the resulting pattern. This is why I didn't just use a vertical bar for the special delimiter.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35476086
Also, the longer your "looking for" string is, the longer the engine may take to run, depending also on the target "to search" string.
0
 
LVL 33

Expert Comment

by:Todd Gerbert
ID: 35476088
The requirement is that I need to be able to match the following scenario.
Input:      “The quick Brown fox jump”
Pattern:         “jumped”

I understand that - and I assume you're thinking that "jump" should match "jumped" because they're both forms of the same verb, unfortunately pattern matching doesn't really work according to English grammatical rules. My suggestion is to not let the user make such matches. ;)

The opposite, with input set to "The quick brown fox jumped" and the pattern set to "jump", makes more sense and is much more easily achievable.
0
 

Author Comment

by:tampsystems
ID: 35476123
Kaufmed, Your solution is working, I have to do some more testing.  

The only special characters that are allowed other then space and alphanumeric are the following:
_:;-/{}|[]

Should I just escape them with the backslash prior to calling the GetModifiedPattern method?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35476171
Should I just escape them with the backslash prior to calling the GetModifiedPattern method
You shouldn't have to. I inserted Regex.Escape() inside of that method which should take care of escaping all regex-related special characters, of which @ is not a member  = )
0
 
LVL 33

Assisted Solution

by:Todd Gerbert
Todd Gerbert earned 600 total points
ID: 35476178
Note that with input like "This [actual] document might contain: [ABC " your pattern is going to match the "[a" in "[actual]", not "[ABC", hence my suggestion to more narrowly define your criteria stands.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35476179
...besides, if you do that, the backslashes themselves will be escaped, and then you'll have extra "possible matches" in your pattern.
0
 

Author Closing Comment

by:tampsystems
ID: 35477888
It turned out that you were both right, i was able to use  the method provided buti still ran into trouble with the matching.  one day i will have to come back to this problem. thanks for your help
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…

755 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question