Exchange email search for specific word and no variant

Posted on 2013-01-23
Last Modified: 2013-02-06
I need to find a tool that will let me do very specific searches of users mailboxes in Exchange 2003. I have the mailboxes in pst form and can open them in Outlook to use Advanced Search, but that doesn't do what I need.  The problem is, if I have the word soft as a search term, the results will return not only soft as a single word, but also any word containing that string, like microsoft, software, softball, etc.  I tried enclosing it in quotation marks, but that didn't help.

Is there a way to do that in Outlook Advanced Search, or does anyone know a tool I can use to do that type of search?  I looked at Lucid8's Digiscope, but their tech support said they can't do that refined a search either without using regular expressions.  I'm not familiar with writing regex and my search involves 31 variables, so don't have time to learn regex well enough to write the proper string.  

Or, can someone tell me how to write a regex that will find Bob, or Bob's or Jones AND test1, or test2, or test3...test28; and preferably not include any word containing test1, or test 2, etc - just the specific word?  In other words, it would find an email that contained Bob and test1, or Bob and test10, or Bob's and test15 - but wouldn't return results for Bobby's and test1 or Bob's and test1234.

Question by:si-support
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
LVL 35

Expert Comment

by:Terry Woods
ID: 38812174
The regex pattern:

Open in new window

(with singleline mode turned on) should match or not match (*mostly) as you specify. For the code below, I've added a bit more to indicate where the first match is found.

^ means match the start of the string (in singleline mode)
(?=xyz) is a positive lookahead for xyz
\b means match the "boundary" between a word character (a character in the set [a-zA-Z0-9_] ) and a non-word character (not in that set) or no character at all.
. is a wildcard for any character
* means match any number (incl zero) of the previous character, so
(?=.*\bBob\b) means lookahead any number of characters and ensure that there exists an occurrence of Bob without another "word" character on either side. You can use this technique for multiple keywords to ensure they all exist, as shown in my pattern.

* You have an error in your specification though I think:
When searching for "test1", there should be no difference between returning a result containing test10 (which you specify as desired) and test1234 (which you specify as undesired). If there really is a difference, you'll need to explain what it is, such as "only one extra digit is ok".

Now for some code, generated from since I'm a PHP programmer:

VB.NET Code Example:
Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("^(?=.*\bBob\b)(?=.*\btest1\b).*?(?:\b(Bob|test1)\b)",RegexOptions.IgnoreCase OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
  End Sub
End Module

Open in new window

Play with it yourself here:

I can't be of much more help putting it into code, I'm afraid, but others in the Regular Expressions or .NET zones might be able to?
LVL 35

Expert Comment

by:Terry Woods
ID: 38812181
I'm assuming you can figure out how to run VB.NET code in your system somehow, with I do know is possible with Outlook at least, and presumably also Exchange

A weakness of using \b to indicate a word boundary is that it treats _ as a word character. You can almost certainly work around this if it's a problem, but unless you want to take this further I won't go into that.

A search for Bob wouldn't find an occurrence of _Bob while you're using (?=.*\bBob\b)

Author Comment

ID: 38818877
Thanks Terry.  I think that may be all I need to get started.  I'll play with it and post the results.
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!


Author Comment

ID: 38819034
Regarding the test1 turning up test10 but not test1234, I was just using 'testx' as an example of different words.  What I really need, for example, is to look for the word 'ball' and have it return only if it finds 'ball' specifically - not as part of another word like 'football', or 'ballroom'.

Can you show me how that would look in the regex string?

LVL 35

Expert Comment

by:Terry Woods
ID: 38824951
In the part of the pattern:

The \b character after "ball" requires a word boundary for it to match, so provided ball isn't followed by an alphanumeric character or underscore, it will match.

Note that the pattern above won't work by itself; you'll still need to include that as part of a larger pattern, whether it is just:

Author Comment

ID: 38833149
Thanks Terry.  I hope I'm not pressing my luck, but could you show me the expression to use in order to search multiple documents for at least Bob, or Bob's, or Jones and at least one of tell, teller, expensive, comp, document, documentation,"good job" (where it turns up only if that phrase is found exactly, not just "good" or "job") and does not pick up any other variations of the words in the list (just 'comp' but not 'complete').

I'm just not catching on to the syntax quick enough to do it in the time frame I have.

LVL 35

Accepted Solution

Terry Woods earned 500 total points
ID: 38853446
Sorry about the slow reply, but here's how it can be done:

^(?=.*\b(bob|jones)\b)(?=.*\b(tell(er)?|expensive|comp|document(ation)?|good job)\b)

Author Closing Comment

ID: 38859348
Thanks, that's great!  With your expression and a regex cheatsheet, maybe I can better understand how this works.  In the meantime, I have a working solution to my problem.

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question