troubleshooting Question

Extracting from Outlook Email to Excel Using Multiple RegEx Patterns

Avatar of Jim M
Jim MFlag for United States of America asked on
Regular ExpressionsOutlookMicrosoft ExcelVBAMicrosoft Office
7 Comments1 Solution541 ViewsLast Modified:
Hi experts,

I've worked on this for a couple of days and I am stuck. Here's the scenario:

    I have some incoming emails with data that I want to parse and insert into Excel.
      I am using regular expressions to do this.
        Some of the pattern matches require a replacement to transform the data (It is coordinate data and the format in the email needs to be converted.
          The coordinates are listed two different ways in the emails. This requires two different replacements because the replacement is different depending on which pattern matches the string.

          I have all this figured out and the code below works, except for one thing. I need to use a total of 10 patterns. If there isn't a match, the "i" counter still ticks up one, so when it matches the next pattern, it shifts the data over one column to the right. So data from Sample A works fine, but when it parses data from Sample B, it shifts the data into the wrong columns.

          Part of me thinks this has to do with arrColumns being tied to (i), but I tried inserting another loop and I couldn't get that to work. I also tried adding i = i - 1 when the pattern didn't match (to keep the "i" counter from adding one and thus moving to the next column), but that created an endless loop. So I am stuck.

          So in a nutshell I need to be able to test multiple patterns and insert the matches (or transformed matches) into Excel.

          I've attached the two types of sample data, the "good" output in Excel, and the "bad" output in Excel (what happens when I use the two additional patterns). The code is also commented so hopefully you can help me figure it out.

          Disclaimer: I am pushing the extent of my knowledge in this area...I just cobbled together some code from various sources to get where I am.

          So the code below works, except for the problem described above...

          Sub ExtractDataToExcel(oItem As MailItem)
              Const xlUp = -4162
              Dim objExcel As Object
              Dim objMaster As Object
              Dim objSheet As Object
              Dim objRegExp As RegExp
              Dim i As Integer
              Dim enviro As String
              Dim bXStarted As Boolean
              Dim objMatch
              
          
              enviro = CStr(Environ("USERPROFILE"))
              Const strWSName As String = "data-export"
              strWBName = enviro & "\Desktop\pings.xlsx"
              
              'Use FileExists function to determine the availability of the workbook
              If Not FileExists(strWBName) Then Exit Sub
              
              On Error Resume Next
              ' Open Excel, hide it
              Set objExcel = GetObject(, "Excel.Application")
              If Err <> 0 Then
                  Application.StatusBar = "Please wait while Excel source is opened ... "
                  Set objExcel = CreateObject("Excel.Application")
                  objExcel.Visible = False
                  bXStarted = True
              End If
              
          
              ' Define patterns to extract from email
              Dim arrPatterns As Variant
              
              ' If I use all the patterns below (10 patterns total) , it shifts the data right into the wrong columns
              ' because if there isn't a match on the pattern, the "i" counter still ticks up one (which adds 1 to the column)
              'arrPatterns = Array("\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}\sGMT", "\d{7}", "X{10}[A-WYZ0-9]+", "X{6}\d{4}", "(Confidence\sLevel(\s)+)(\d{2,3})", "(\d{2,3})\s(\d{2,3})\s(\d{2,3})\.(\d{2,3})N", "N(\d{2})\.(\d{5,})", "(\d{2,3})\s(\d{2,3})\s(\d{2,3})\.(\d{2,3})W", "W(\d{2})\.(\d{5,})", "\d{1,4}\smeter(s)?\b", "\b\d\w-\w{4}\b")
              
              ' If I use the patterns below (8 patterns total), it works for data like in Sample A, but not Sample B
              ' (because the Lat/Long format is different in Sample B so that needs to be captured by the additional 2 patterns listed in the arrPatterns above)
              arrPatterns = Array("\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}:\d{2}\sGMT", "\d{7}", "X{10}[A-WYZ0-9]+", "X{6}\d{4}", "(Confidence\sLevel(\s)+)(\d{2,3})", "(\d{2,3})\s(\d{2,3})\s(\d{2,3})\.(\d{2,3})N", "(\d{2,3})\s(\d{2,3})\s(\d{2,3})\.(\d{2,3})W", "\d{1,4}\smeter(s)?\b", "\b\d\w-\w{4}\b")
              
             
              Dim arrColumns As Variant
              arrColumns = Array("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
          
              ' Load the workbook
              Set objMaster = objExcel.Workbooks.Open(strWBName)
              
              
              ' If the sheet doesn't exist in the workbook, add it.
              ' Uses CreateSheetIf helper function
              CreateSheetIf (strWSName)
              
              Set objSheet = objMaster.Sheets(strWSName)
          
              ' Calculate next row to add after existing used rows
              intRow = objSheet.Range("A" & objSheet.Rows.Count).End(-4162).Row + 1
          
              ' Create RegEx object for searching
              Set objRegExp = New RegExp
          
              ' Search email body for each pattern
              For i = 0 To UBound(arrPatterns)
                  objRegExp.Pattern = arrPatterns(i)
                  objRegExp.Global = False
                  Set colMatches = objRegExp.Execute(oItem.Body)
                  
                  If colMatches.Count > 0 Then ' If there is a match, continue...
                      
                      Set objMatch = colMatches.Item(0)
                      strMatch = objMatch.Value
                 
                      ' some of the pattern matches require transformation
                      ' if the match is one of the patterns below, apply the listed replacement
                      Select Case objRegExp.Pattern
                      
                      Case "(Confidence\sLevel(\s)+)(\d{2,3})"
                      strReplace = RegExpReplace(objMatch.Value, objRegExp.Pattern, "$3", False)
          
                      Case "(\d{2,3})\s(\d{2,3})\s(\d{2,3})\.(\d{2,3})N"
                      strReplace = RegExpReplace(objMatch.Value, objRegExp.Pattern, "$1.$2$3$4", False)
          
                      Case "N(\d{2})\.(\d{5,})"
                      strReplace = RegExpReplace(objMatch.Value, objRegExp.Pattern, "$1.$2", False)
                      
                      Case "(\d{2,3})\s(\d{2,3})\s(\d{2,3})\.(\d{2,3})W"
                      strReplace = RegExpReplace(objMatch.Value, objRegExp.Pattern, "-$1.$2$3$4", False)
                      
                      Case "W(\d{2})\.(\d{5,})"
                      strReplace = RegExpReplace(objMatch.Value, objRegExp.Pattern, "-$1.$2", False)
                                  
                      Case Else
                       strReplace = strMatch ' If none of the above replacement patterns match, just insert the original match into Excel
                       
                      End Select
             
                  objSheet.Cells(intRow, arrColumns(i)).Value = strReplace ' Do the insert
                  
                  Else
                  ' I feel like my solution should go here but so far no luck
                  ' I tried i = i - 1 here, thinking if there was no match on the pattern, then don't advance the "i" counter
                  ' but that resulted in an endless loop
                  End If
              Next
              
              ' this just adds a calculated value in the rightmost column
              objSheet.Cells(intRow, arrColumns(i)).Value = "=HYPERLINK(""https://www.google.com/webhp?ie=UTF-8#q=""&F2&"",""&G2,""Map these coordinates"")"
               
              ' Save updated Excel file
              objMaster.Save
              objMaster.Close SaveChanges:=True
              If bXStarted Then
                  objExcel.Quit
              End If
              
              ' Release objects
              Set objRegExp = Nothing
              Set objSheet = Nothing
              Set objMaster = Nothing
              Set objExcel = Nothing
                
          End Sub
          
          Sub ExtractEmailData()
          ' Call this macro from a button in Outlook
          Dim oItem As MailItem
          If Application.ActiveExplorer.Selection.Count = 0 Then
          MsgBox "Please select one or more emails first!", vbCritical, "Error"
          Exit Sub
          End If
          For Each oItem In ActiveExplorer.Selection
          ExtractDataToExcel oItem
          Next oItem
          Set oItem = Nothing
          MsgBox "The selected emails have been extracted.", , "Initech"
          End Sub
          
          Public Function FileExists(ByVal Filename As String) As Boolean
          Dim nAttr As Long
          On Error GoTo NoFile
          nAttr = GetAttr(Filename)
          If (nAttr And vbDirectory) <> vbDirectory Then
          FileExists = True
          End If
          NoFile:
          End Function
          
          Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
              Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = False, _
              Optional MultiLine As Boolean = False)
              
              ' Function written by Patrick G. Matthews.
          	' Original source:
          	' http://stackoverflow.com/questions/10393087/expressing-basic-access-query-criteria-as-regular-expressions
          
              
              Static RegX As Object
              
              If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
              With RegX
                  .Pattern = PatternStr
                  .Global = ReplaceAll
                  .IgnoreCase = Not MatchCase
                  .MultiLine = MultiLine
              End With
              
              RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
              
          End Function
          
          Function CreateSheetIf(strSheetName As String) As Boolean
              Dim wsTest As Worksheet
              CreateSheetIf = False
               
              Set wsTest = Nothing
              On Error Resume Next
              Set wsTest = ActiveWorkbook.Worksheets(strSheetName)
              On Error GoTo 0
               
              If wsTest Is Nothing Then
                  CreateSheetIf = True
                  Worksheets.Add.Name = strSheetName
              End If
               
          End Function
          
          sample_data.jpg
          sample_excel_right.jpg
          sample_excel_wrong.jpg
          ASKER CERTIFIED SOLUTION
          Join our community to see this answer!
          Unlock 1 Answer and 7 Comments.
          Start Free Trial
          Learn from the best

          Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

          Andrew Hancock - VMware vExpert
          See if this solution works for you by signing up for a 7 day free trial.
          Unlock 1 Answer and 7 Comments.
          Try for 7 days

          ”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

          -Mike Kapnisakis, Warner Bros