RegEx Help

I want to look for matches in a string and those matches may contain Alpha-Numeric values as well as special characters Here is the example source

%*)
bright
bright light
bright future

The expression is

\W(bright|bright light|bright future)\W

It completely ignores the %*) and find "bright" twice, no "bright light" or "bright future". Just bright. Can someone please help?

Thanks
JS
jimmysaundersAsked:
Who is Participating?
 
Darrell PorterConnect With a Mentor Enterprise Business Process ArchitectCommented:
bool matchFound = false;
foreach (string str in strArray)
    {
       foreach (string str2 in strArray2)
       {
           if (str == str2)
           {
              matchFound = true;
              Console.WriteLine("a match has been found");
           }
       }

       if (matchFound == false)
       {
          Console.WriteLine("no match found");
       }
    }

Open in new window


or, in fewer lines

foreach (string str in strArray)
{
    if(strArray2.Contains(str))
    {
       Console.WriteLine("a match has been found");
    }
    else
    {
       Console.WriteLine("no match found");
    }
}

Open in new window


A regular expression would take far, far, FAR longer to execute.
0
 
ozoCommented:
(bright light|bright future|bright|%\*\))
0
 
jimmysaundersAuthor Commented:
\W(bright light|bright future|bright|%\*\))\W

This only finds bright light.
0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

 
Dan CraciunIT ConsultantCommented:
Try this:
(bright( light| future)?|%\*\))
0
 
jimmysaundersAuthor Commented:
@"\W(bright( light| future)?|%\*\))\W"

Finds bright lights and lights. Also, at the run time, I'm getting the words from a database so a lot more to do if that is the only way to go.
0
 
ozoCommented:
Are you saying that none of them find "bright future"?
What are the characters immediately preceding and following  "bright future"?
0
 
Darrell PorterEnterprise Business Process ArchitectCommented:
What do you want it to return - "%*)" ?

If that's the case, try

\W|_
0
 
jimmysaundersAuthor Commented:
Thanks for the responses folks.

@WalkaboutTigger: I want it to find all four words.

@ozo: Yes, none of them found "bright future". In the source string, the characters following it are \r\n (it's a file whose contents I am searching hence the line feed)
0
 
Darrell PorterEnterprise Business Process ArchitectCommented:
What language are you using this in?
And you only wish to find those 4 specific strings or ?
0
 
jimmysaundersAuthor Commented:
C#. And I am using those words just as an example. The list is huge.
0
 
jimmysaundersAuthor Commented:
Maybe the original code will help

var regex = new Regex(string.Format(@"\W({0})\W", strKey), RegexOptions.IgnoreCase);

var matches = regex.Matches(filecontents);
0
 
Dan CraciunIT ConsultantCommented:
OK.
Ignore for the moment the regular expressions and please state in words what you want to achieve.
0
 
jimmysaundersAuthor Commented:
I have a list of words phrases and symbols in a table in a database. I have a flat file and I want to find out all the words, phrases and symbols in that file from the table. The filecontent variable in the above example is the string that contains the contents of the file that I am looking in and strKey is the pipe-delimited word list from the table which, for example, can be something like

(bright|bright future|bright light)


It works fine as long as i'ts a single word and contains no special characters. But phrases and symbols are not working.
0
 
jimmysaundersAuthor Commented:
Interestingly, it finds the phrases that are unique but in the above example it finds "bright" and then "bright " and another "bright "
0
 
Darrell PorterEnterprise Business Process ArchitectCommented:
So you essentially have two arrays (the flat file and the table from the database) and you are attemtping to determine intersections.

Using a regular expression for this will dramatically effect performance.

Likely the fastest method of solving this scenario is to first sort both arrays and then, using the db array as your authoritative source, iterate through the contents of the flat file array and determine each entry is stored in the db.

Are you trying to keep the matches or the differences?

What language do you want this written in, or is pseudo-code sufficient?
0
 
Dan CraciunIT ConsultantCommented:
That's normal. 'bright' is the first in the alternation, so the moment it finds 'bright' the match is complete.

You would need to reverse the alternation so it finds the rest of forms. See Ozo's solution.
0
 
jimmysaundersAuthor Commented:
@WalkaboutTigger: I'm using C# and I am trying to keep the matches.
0
 
käµfm³d 👽Commented:
Off Topic
A regular expression would take far, far, FAR longer to execute.
That's an over exaggeration, I think. Depending on how the regex is written, a regex could theoretically outperform nested loops.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.