Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Exchange email search for specific word and no variant

Posted on 2013-01-23
Medium Priority
Last Modified: 2013-02-06
I need to find a tool that will let me do very specific searches of users mailboxes in Exchange 2003. I have the mailboxes in pst form and can open them in Outlook to use Advanced Search, but that doesn't do what I need.  The problem is, if I have the word soft as a search term, the results will return not only soft as a single word, but also any word containing that string, like microsoft, software, softball, etc.  I tried enclosing it in quotation marks, but that didn't help.

Is there a way to do that in Outlook Advanced Search, or does anyone know a tool I can use to do that type of search?  I looked at Lucid8's Digiscope, but their tech support said they can't do that refined a search either without using regular expressions.  I'm not familiar with writing regex and my search involves 31 variables, so don't have time to learn regex well enough to write the proper string.  

Or, can someone tell me how to write a regex that will find Bob, or Bob's or Jones AND test1, or test2, or test3...test28; and preferably not include any word containing test1, or test 2, etc - just the specific word?  In other words, it would find an email that contained Bob and test1, or Bob and test10, or Bob's and test15 - but wouldn't return results for Bobby's and test1 or Bob's and test1234.

Question by:si-support
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
LVL 35

Expert Comment

by:Terry Woods
ID: 38812174
The regex pattern:

Open in new window

(with singleline mode turned on) should match or not match (*mostly) as you specify. For the code below, I've added a bit more to indicate where the first match is found.

^ means match the start of the string (in singleline mode)
(?=xyz) is a positive lookahead for xyz
\b means match the "boundary" between a word character (a character in the set [a-zA-Z0-9_] ) and a non-word character (not in that set) or no character at all.
. is a wildcard for any character
* means match any number (incl zero) of the previous character, so
(?=.*\bBob\b) means lookahead any number of characters and ensure that there exists an occurrence of Bob without another "word" character on either side. You can use this technique for multiple keywords to ensure they all exist, as shown in my pattern.

* You have an error in your specification though I think:
When searching for "test1", there should be no difference between returning a result containing test10 (which you specify as desired) and test1234 (which you specify as undesired). If there really is a difference, you'll need to explain what it is, such as "only one extra digit is ok".

Now for some code, generated from myregextester.com since I'm a PHP programmer:

VB.NET Code Example:
Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^(?=.*\bBob\b)(?=.*\btest1\b).*?(?:\b(Bob|test1)\b)",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
  End Sub
End Module

Open in new window

Play with it yourself here: http://www.myregextester.com/?r=8df39ce8

I can't be of much more help putting it into code, I'm afraid, but others in the Regular Expressions or .NET zones might be able to?
LVL 35

Expert Comment

by:Terry Woods
ID: 38812181
I'm assuming you can figure out how to run VB.NET code in your system somehow, with I do know is possible with Outlook at least, and presumably also Exchange

A weakness of using \b to indicate a word boundary is that it treats _ as a word character. You can almost certainly work around this if it's a problem, but unless you want to take this further I won't go into that.

A search for Bob wouldn't find an occurrence of _Bob while you're using (?=.*\bBob\b)

Author Comment

ID: 38818877
Thanks Terry.  I think that may be all I need to get started.  I'll play with it and post the results.
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.


Author Comment

ID: 38819034
Regarding the test1 turning up test10 but not test1234, I was just using 'testx' as an example of different words.  What I really need, for example, is to look for the word 'ball' and have it return only if it finds 'ball' specifically - not as part of another word like 'football', or 'ballroom'.

Can you show me how that would look in the regex string?

LVL 35

Expert Comment

by:Terry Woods
ID: 38824951
In the part of the pattern:

The \b character after "ball" requires a word boundary for it to match, so provided ball isn't followed by an alphanumeric character or underscore, it will match.

Note that the pattern above won't work by itself; you'll still need to include that as part of a larger pattern, whether it is just:

Author Comment

ID: 38833149
Thanks Terry.  I hope I'm not pressing my luck, but could you show me the expression to use in order to search multiple documents for at least Bob, or Bob's, or Jones and at least one of tell, teller, expensive, comp, document, documentation,"good job" (where it turns up only if that phrase is found exactly, not just "good" or "job") and does not pick up any other variations of the words in the list (just 'comp' but not 'complete').

I'm just not catching on to the syntax quick enough to do it in the time frame I have.

LVL 35

Accepted Solution

Terry Woods earned 2000 total points
ID: 38853446
Sorry about the slow reply, but here's how it can be done:

^(?=.*\b(bob|jones)\b)(?=.*\b(tell(er)?|expensive|comp|document(ation)?|good job)\b)

Author Closing Comment

ID: 38859348
Thanks, that's great!  With your expression and a regex cheatsheet, maybe I can better understand how this works.  In the meantime, I have a working solution to my problem.

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you an Exchange administrator employed with an organization? And, have you encountered a corrupt Exchange database due to which you are not able to open its EDB file. This article will explain all the steps to repair corrupt Exchange database.
By default Outlook 2016 displays only one time zone in the Calendar. The following article explains how to display two time zones in one calendar view.
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Suggested Courses

598 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question