Complex text extraction

sandramac
sandramac used Ask the Experts™
on
Below is a data in cell a2.  in cell K2 in need to extract the city, in L2, i need the State, and M2 the zip code.  These values will change, so the only permamanet text is "Known as:" and "Borrower"  so the city will be between both commas all the time, then the state then zip code.  

Situated in the City of xxxx, County of xxx and State of xx, and known as being all of Lot 15 in the blah blah foundation, as recorded in Plat Book xx, Pages xx, xx, found County Records of Plats, be the same more or less, but subject to all legal highways. Permanent Parcel No. xx-xxxxx1 Known as:  155 Goodhue Dr., Akron, Ohio 44313 Borrower  understands and agrees that xxxx, LLC holds only legal title to the interests granted by Borrower in this mortgage, but, if necessary to comply with law or custom, xxxx, LLC (as
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Mark EdwardsChief Technology Officer

Commented:
Below what?  What are you talking about?  Do you have a single cell with data that you need to parse and put into other cells?  What are you wanting to do???????????  This doesn't make any sense as a problem or a question.
Excel & VBA Expert
Most Valuable Expert 2018
Awarded 2015
Commented:
Place the following User Defined Functions on a Standard Module like Module1.
Function getCity(str As String) As String
With CreateObject("VBScript.RegExp")
    .Global = False
    .Pattern = ""City of ([A-Za-z]+),""
    If .test(str) Then
        getCity = .Execute(str)(0).submatches(0)
    End If
End With
End Function

Function getState(str As String) As String
With CreateObject("VBScript.RegExp")
    .Global = False
    .Pattern = "([A-Za-z]+)\s\d{5}\sBorrower"
    If .test(str) Then
        getState = .Execute(str)(0).submatches(0)
    End If
End With
End Function

Function getZip(str As String) As String
With CreateObject("VBScript.RegExp")
    .Global = False
    .Pattern = "[A-Za-z]+\s(\d{5})\sBorrower"
    If .test(str) Then
        getZip = .Execute(str)(0).submatches(0)
    End If
End With
End Function

Open in new window


And then place the following formulas on the Sheet.

In K2
=getCity(A2)

Open in new window


in L2
=getState(A2)

Open in new window


in M2
=getZip(A2)

Open in new window

Author

Commented:
In cell A1 this is what is in the cell:  

Situated in the City of xxxx, County of xxx and State of xx, and known as being all of Lot 15 in the blah blah foundation, as recorded in Plat Book xx, Pages xx, xx, found County Records of Plats, be the same more or less, but subject to all legal highways. Permanent Parcel No. xx-xxxxx1 Known as:  155 Goodhue Dr., Akron, Ohio 44313 Borrower  understands and agrees that xxxx, LLC holds only legal title to the interests granted by Borrower in this mortgage, but, if necessary to comply with law or custom, xxxx, LLC (as

I need to extract Akron in one cell, Ohio in another cell, and 44313 in another cell.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Subodh Tiwari (Neeraj)Excel & VBA Expert
Most Valuable Expert 2018
Awarded 2015

Commented:
In that case, replace the first function getCity with the following one...

Function getCity(str As String) As String
With CreateObject("VBScript.RegExp")
    .Global = False
    .Pattern = "([A-Za-z]+),\s[A-Za-z]+\s\d{5}\sBorrower"
    If .test(str) Then
        getCity = .Execute(str)(0).submatches(0)
    End If
End With
End Function

Open in new window

ExtractText.xlsm
Mark EdwardsChief Technology Officer

Commented:
ahhhhh…. simple text parsing.  got it.
You just need to find the position in the string of "Known as:", then find the 1st comma, that will be the beginning of the city.  Then find the comma after city and the word "Borrower" and that will be state and zipcode in between.  If the zipcode is always 5 characters, then everything tp the left of the last 5 characters will be the state.

I'll make a quick subroutine that will do that in my next post.
Mark EdwardsChief Technology Officer

Commented:
OK, It's simple and it works, but it's not fancy.  The subroutine below gave me the city, state, and zip from your "blob" of text.
Private Sub Command3_Click()

    Dim strBlob As String
    Dim strCity As String
    Dim strState As String
    Dim strZip As String
    Dim x As Integer
    Dim y As Integer
    Dim strWorking As String

    'LETS GET THE BLOB OF TEXT WE NEED TO PARSE:
    strBlob = "Situated in the City of xxxx, County of xxx and State of xx, and known as being all of Lot 15 in the blah blah foundation, as recorded in Plat Book xx, Pages xx, xx, found County Records of Plats, be the same more or less, but subject to all legal highways. Permanent Parcel No. xx-xxxxx1 Known as:  155 Goodhue Dr., Akron, Ohio 44313 Borrower  understands and agrees that xxxx, LLC holds only legal title to the interests granted by Borrower in this mortgage, but, if necessary to comply with law or custom, xxxx, LLC (as"
    'LETS GET THE POSITION OF "Known as:"
    x = InStr(1, strBlob, "Known as:")
    'NOW LETS GET THE POSITION OF THEEND OF "Known as:"
    x = x + Len("Known as:") + 1
    'LETS GET RID OF EVERYTHING TO THE LEFT OF X:
    strWorking = Trim(Mid(strBlob, x))
    'NOW LETS FIND THE 1ST COMMA AND GET RID OF THE ADDRESS:
    strWorking = Trim(Mid(strWorking, InStr(1, strWorking, ",") + 1))
    'NOW LETS FIND THE POSITION OF BORROWER:
    y = InStr(1, strWorking, "Borrower")
    'LETS GET RID OF "BORROWER" AND EVERYTHING AFTER THAT:
    strWorking = Trim(Left(strWorking, y - 1))
    'WHAT WE HAVE LEFT IS THE STRING CONTAINING THE CITY, STATE, AND ZIP:
    'NOW LETS PARSE THE PIECES:
    'LETS START WITH THE CITY by getting the position of the first comma:
    x = InStr(1, strWorking, ",")
    'parse the city:
    strCity = Trim(Left(strWorking, x - 1))
    'now lets get rid of the 1st comma and everything to the left so all we have left is state and zipcode:
    strWorking = Trim(Mid(strWorking, x + 1))
    'lets get the 5 zipcode characters:
    strZip = Trim(Right(strWorking, 5))
    'what's left is state:
    strWorking = Trim(Left(strWorking, Len(strWorking) - 6))
    strState = Trim(strWorking)
    
    MsgBox "City = " & strCity & vbCrLf & "State = " & strState & vbCrLf & "Zipcode = " & strZip
    
End Sub

Open in new window

Author

Commented:
Thank you Mark, I like that also

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial