Exchange email search for specific word and no variant

I need to find a tool that will let me do very specific searches of users mailboxes in Exchange 2003. I have the mailboxes in pst form and can open them in Outlook to use Advanced Search, but that doesn't do what I need.  The problem is, if I have the word soft as a search term, the results will return not only soft as a single word, but also any word containing that string, like microsoft, software, softball, etc.  I tried enclosing it in quotation marks, but that didn't help.

Is there a way to do that in Outlook Advanced Search, or does anyone know a tool I can use to do that type of search?  I looked at Lucid8's Digiscope, but their tech support said they can't do that refined a search either without using regular expressions.  I'm not familiar with writing regex and my search involves 31 variables, so don't have time to learn regex well enough to write the proper string.  

Or, can someone tell me how to write a regex that will find Bob, or Bob's or Jones AND test1, or test2, or test3...test28; and preferably not include any word containing test1, or test 2, etc - just the specific word?  In other words, it would find an email that contained Bob and test1, or Bob and test10, or Bob's and test15 - but wouldn't return results for Bobby's and test1 or Bob's and test1234.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Terry WoodsIT GuruCommented:
The regex pattern:

Open in new window

(with singleline mode turned on) should match or not match (*mostly) as you specify. For the code below, I've added a bit more to indicate where the first match is found.

^ means match the start of the string (in singleline mode)
(?=xyz) is a positive lookahead for xyz
\b means match the "boundary" between a word character (a character in the set [a-zA-Z0-9_] ) and a non-word character (not in that set) or no character at all.
. is a wildcard for any character
* means match any number (incl zero) of the previous character, so
(?=.*\bBob\b) means lookahead any number of characters and ensure that there exists an occurrence of Bob without another "word" character on either side. You can use this technique for multiple keywords to ensure they all exist, as shown in my pattern.

* You have an error in your specification though I think:
When searching for "test1", there should be no difference between returning a result containing test10 (which you specify as desired) and test1234 (which you specify as undesired). If there really is a difference, you'll need to explain what it is, such as "only one extra digit is ok".

Now for some code, generated from since I'm a PHP programmer:

VB.NET Code Example:
Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^(?=.*\bBob\b)(?=.*\btest1\b).*?(?:\b(Bob|test1)\b)",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
  End Sub
End Module

Open in new window

Play with it yourself here:

I can't be of much more help putting it into code, I'm afraid, but others in the Regular Expressions or .NET zones might be able to?
Terry WoodsIT GuruCommented:
I'm assuming you can figure out how to run VB.NET code in your system somehow, with I do know is possible with Outlook at least, and presumably also Exchange

A weakness of using \b to indicate a word boundary is that it treats _ as a word character. You can almost certainly work around this if it's a problem, but unless you want to take this further I won't go into that.

A search for Bob wouldn't find an occurrence of _Bob while you're using (?=.*\bBob\b)
si-supportAuthor Commented:
Thanks Terry.  I think that may be all I need to get started.  I'll play with it and post the results.
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

si-supportAuthor Commented:
Regarding the test1 turning up test10 but not test1234, I was just using 'testx' as an example of different words.  What I really need, for example, is to look for the word 'ball' and have it return only if it finds 'ball' specifically - not as part of another word like 'football', or 'ballroom'.

Can you show me how that would look in the regex string?

Terry WoodsIT GuruCommented:
In the part of the pattern:

The \b character after "ball" requires a word boundary for it to match, so provided ball isn't followed by an alphanumeric character or underscore, it will match.

Note that the pattern above won't work by itself; you'll still need to include that as part of a larger pattern, whether it is just:
si-supportAuthor Commented:
Thanks Terry.  I hope I'm not pressing my luck, but could you show me the expression to use in order to search multiple documents for at least Bob, or Bob's, or Jones and at least one of tell, teller, expensive, comp, document, documentation,"good job" (where it turns up only if that phrase is found exactly, not just "good" or "job") and does not pick up any other variations of the words in the list (just 'comp' but not 'complete').

I'm just not catching on to the syntax quick enough to do it in the time frame I have.

Terry WoodsIT GuruCommented:
Sorry about the slow reply, but here's how it can be done:

^(?=.*\b(bob|jones)\b)(?=.*\b(tell(er)?|expensive|comp|document(ation)?|good job)\b)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
si-supportAuthor Commented:
Thanks, that's great!  With your expression and a regex cheatsheet, maybe I can better understand how this works.  In the meantime, I have a working solution to my problem.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.