Exchange email search for specific word and no variant

Posted on 2013-01-23
Medium Priority
Last Modified: 2013-02-06
I need to find a tool that will let me do very specific searches of users mailboxes in Exchange 2003. I have the mailboxes in pst form and can open them in Outlook to use Advanced Search, but that doesn't do what I need.  The problem is, if I have the word soft as a search term, the results will return not only soft as a single word, but also any word containing that string, like microsoft, software, softball, etc.  I tried enclosing it in quotation marks, but that didn't help.

Is there a way to do that in Outlook Advanced Search, or does anyone know a tool I can use to do that type of search?  I looked at Lucid8's Digiscope, but their tech support said they can't do that refined a search either without using regular expressions.  I'm not familiar with writing regex and my search involves 31 variables, so don't have time to learn regex well enough to write the proper string.  

Or, can someone tell me how to write a regex that will find Bob, or Bob's or Jones AND test1, or test2, or test3...test28; and preferably not include any word containing test1, or test 2, etc - just the specific word?  In other words, it would find an email that contained Bob and test1, or Bob and test10, or Bob's and test15 - but wouldn't return results for Bobby's and test1 or Bob's and test1234.

Question by:si-support
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
LVL 35

Expert Comment

by:Terry Woods
ID: 38812174
The regex pattern:

Open in new window

(with singleline mode turned on) should match or not match (*mostly) as you specify. For the code below, I've added a bit more to indicate where the first match is found.

^ means match the start of the string (in singleline mode)
(?=xyz) is a positive lookahead for xyz
\b means match the "boundary" between a word character (a character in the set [a-zA-Z0-9_] ) and a non-word character (not in that set) or no character at all.
. is a wildcard for any character
* means match any number (incl zero) of the previous character, so
(?=.*\bBob\b) means lookahead any number of characters and ensure that there exists an occurrence of Bob without another "word" character on either side. You can use this technique for multiple keywords to ensure they all exist, as shown in my pattern.

* You have an error in your specification though I think:
When searching for "test1", there should be no difference between returning a result containing test10 (which you specify as desired) and test1234 (which you specify as undesired). If there really is a difference, you'll need to explain what it is, such as "only one extra digit is ok".

Now for some code, generated from myregextester.com since I'm a PHP programmer:

VB.NET Code Example:
Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^(?=.*\bBob\b)(?=.*\btest1\b).*?(?:\b(Bob|test1)\b)",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
  End Sub
End Module

Open in new window

Play with it yourself here: http://www.myregextester.com/?r=8df39ce8

I can't be of much more help putting it into code, I'm afraid, but others in the Regular Expressions or .NET zones might be able to?
LVL 35

Expert Comment

by:Terry Woods
ID: 38812181
I'm assuming you can figure out how to run VB.NET code in your system somehow, with I do know is possible with Outlook at least, and presumably also Exchange

A weakness of using \b to indicate a word boundary is that it treats _ as a word character. You can almost certainly work around this if it's a problem, but unless you want to take this further I won't go into that.

A search for Bob wouldn't find an occurrence of _Bob while you're using (?=.*\bBob\b)

Author Comment

ID: 38818877
Thanks Terry.  I think that may be all I need to get started.  I'll play with it and post the results.
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why


Author Comment

ID: 38819034
Regarding the test1 turning up test10 but not test1234, I was just using 'testx' as an example of different words.  What I really need, for example, is to look for the word 'ball' and have it return only if it finds 'ball' specifically - not as part of another word like 'football', or 'ballroom'.

Can you show me how that would look in the regex string?

LVL 35

Expert Comment

by:Terry Woods
ID: 38824951
In the part of the pattern:

The \b character after "ball" requires a word boundary for it to match, so provided ball isn't followed by an alphanumeric character or underscore, it will match.

Note that the pattern above won't work by itself; you'll still need to include that as part of a larger pattern, whether it is just:

Author Comment

ID: 38833149
Thanks Terry.  I hope I'm not pressing my luck, but could you show me the expression to use in order to search multiple documents for at least Bob, or Bob's, or Jones and at least one of tell, teller, expensive, comp, document, documentation,"good job" (where it turns up only if that phrase is found exactly, not just "good" or "job") and does not pick up any other variations of the words in the list (just 'comp' but not 'complete').

I'm just not catching on to the syntax quick enough to do it in the time frame I have.

LVL 35

Accepted Solution

Terry Woods earned 2000 total points
ID: 38853446
Sorry about the slow reply, but here's how it can be done:

^(?=.*\b(bob|jones)\b)(?=.*\b(tell(er)?|expensive|comp|document(ation)?|good job)\b)

Author Closing Comment

ID: 38859348
Thanks, that's great!  With your expression and a regex cheatsheet, maybe I can better understand how this works.  In the meantime, I have a working solution to my problem.

Featured Post

Office 365 Training for Admins - 7 Day Trial

Learn how to provision tenants, synchronize on-premise Active Directory, implement Single Sign-On, customize Office deployment, and protect your organization with eDiscovery and DLP policies.  Only from Platform Scholar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A list of top three free exchange EDB viewers that helps the user to extract a mailbox from an unmounted .edb file and get a clear preview of all emails & other items with just a single click on mailboxes.
After hours on line I found a solution which pointed to the inherited Active Directory permissions . You have to give/allow permissions to the "Exchange trusted subsystem" for the user in the Active Directory...
This video discusses moving either the default database or any database to a new volume.
This is my first video review of Microsoft Bookings, I will be doing a part two with a bit more information, but wanted to get this out to you folks.
Suggested Courses
Course of the Month9 days, 19 hours left to enroll

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question