We help IT Professionals succeed at work.
Troubleshooting Question

Extract measurements from product descriptions

83 Views
Last Modified: 2020-09-07
Dear Experts,

I need to extract substrings (measurements) from thousands of product descriptions. I am pretty sure this cannot be done using a formula, hence, I guess I require a macro solution.

As you can see from the screenshot, the measurements to be extracted have four possible formats. The trouble is that there are cases where there is no space or comma in front of or behind these measurement substrings.This is the case in the first and the third example.

I hope somebody will be able to come up with some macro solution for my problem.

Help is very much appreciated. Thank you very much in advance.

Regards, Andreas

I have attached the above sample file for your convenience.

Comment
Watch Question

KimputerIT Manager
CERTIFIED EXPERT

Commented:
Follow the steps here:

https://www.automateexcel.com/vba/regex/

Then use this pattern:

Pattern = "\d[\.]{0,1}[,]{0,1}[\d]{0,1}X\dMM"

Use some code to loop through the column and place the match in the second column



Subodh Tiwari (Neeraj)Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015

Commented:
Hi Andreas,

You may try something like this...
Sub ExtractSubString()
Dim Matches As Object
Dim Rng     As Range
Dim Cel     As Range
Dim lr      As Long

lr = Cells(Rows.Count, 1).End(xlUp).Row
Set Rng = Range("A1:A" & lr)

With CreateObject("VBScript.RegExp")
    .Global = False
    .Pattern = "\d+[\.,]?\d?X\dMM"
    For Each Cel In Rng
        If .Test(Cel.Value) Then
            Set Matches = .Execute(Cel.Value)
            Cel.Offset(0, 1).Value = Matches(0)
        End If
    Next Cel
End With
End Sub

Click the button called "Extract Sub-Strings" on Sheet1 to run the code to get the desired output in column B.
Extract Sub-Strings.xlsm


Andreas HermleTeam leader

Author

Commented:
Dear Kimputer,
thank you very much for your great help. I was gonna read thru the instructions on the website manage and run your code, when Neeraj came up with his solution. And as a matter of fact, his code works just fine. I will test yours as well of course.

Hi Neraj, a very nice job, works like a charm, will do some more testing. Again, thank you so much for your superb approach, I really appreciate it.

Regards, Andreas
Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016

Commented:
I take a similar, but slightly different approach.  I find myself occasionally looking for a way to pull out text from cells using a formula, that I have created a small function I keep handy for things like this.  I just have to add the function to the Excel workbook (no changes) and then can use a formula to pass it the regular expression to use to extract the matching text I want.

So in your case I added this function to a module in Excel VBA:

Public Function RegexFind(TextToSearch As String, RegexPattern As String, IgnoreCase As Boolean, MatchNumber As Integer) As String
    ' Local variables
    Dim Match As Object
    
    ' Return empty string if no match found
    RegexFind = ""
    
    ' Use scripting regualr expression engine
    With CreateObject("Vbscript.Regexp")
        
        ' Set RegEx options
        .Global = True
        .IgnoreCase = IgnoreCase
        .Pattern = RegexPattern
    
        ' Do we have any matches?
        If .Test(TextToSearch) Then
            ' Matches found, execute the regex processing
            Set Matches = .Execute(TextToSearch)
            
            ' If we found a match for the number requested return it
            If MatchNumber <= Matches.Count Then
                RegexFind = Matches(MatchNumber - 1)
            End If
        End If
    
    End With

End Function

Open in new window

And then I used a formula like this in the column that I want the extracted data:
=RegexFind(A2, "[0-9.,]+X[0-9.,]MM", TRUE, 1)

Open in new window

Nice and easy.

Also attaching the workbook I used for testing...

EE29193589.xlsm


»bp
Andreas HermleTeam leader

Author

Commented:
Hi Neeraj, I just saw that there are hundreds of strings that have the following make-up of the substrings:

a) 1.5X6X11.1MM, i.e. #.#X6X##.#MM, that is there are two 'X' characters between the digits.
b) 1.7 X 9 X 12.4MM, that is as above but with spaces in between the 'X' characters

So how has the pattern to be changed to consider the above two requirements? I hope this is not to much asking since the inital code is working great.

The code really works fine. Great!! Thank you very much 

Andreas HermleTeam leader

Author

Commented:
Ups, Bill, you too posted a solution. Thank you very much for it. I am off now to catch the bus, will get back to you as soon as possible. Thank you very much. Regards, Andreas
Subodh Tiwari (Neeraj)Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015

Commented:
Okay, try this and see if you get the desired output for those strings as well.

Extract Sub-Strings v2.xlsm
Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016

Commented:
So how has the pattern to be changed to consider the above two requirements?

Here's an updated regex for my approach that seems to work.  As always, there are many ways to design a regex to solve a problem...

=RegexFind(A2, "[0-9.,]+( *X *[0-9.,]+)+MM", TRUE, 1)

Open in new window


»bp
Andreas HermleTeam leader

Author

Commented:
Hi Neeraj,
great this did the trick. Thank you very much for it. 
I got a final expression that does not get considered. I have attached a sample file where the orange shaded cells should also get their measurements extracted.
After your final coding I will gladly be awarding points.

Thank you so much for your great and professional support. I really highly appreciate it. Regards, Andreas

Extract-Sub-Strings-v3.xlsm
Andreas HermleTeam leader

Author

Commented:
HI Bill,

just tested your solution. It is also great stuff, works as desired. Thank you for your excellent support. I will do some more testing and then let you know.

Regards, Andreas
Andreas HermleTeam leader

Author

Commented:
Neeraj, as I said we are almost there with your fantastic approach. I got some measurements that have the measurement unit in small letters not in capital letters, i.e. mm instead of MM. What can I do about this?
Help is very much appreciated. Thank you very much in advance. Regards, Andreas
Andreas HermleTeam leader

Author

Commented:
Bill, I did some testing and as I said before your approach is working fine as well. Thank you very much for it.

Would you please be so kind as to consider the possible following formats in your expression as well as listed in the attached file.

This would be fantastic and I could then go ahead with awarding the points which the both of you really deserve.

Thank you very much for it.

Regards, Andreas
new-reqs-bob.xlsm
Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Andreas HermleTeam leader

Author

Commented:
Wow, Neeraj, this is so fantastic!! How come you know all this? You truly deserve your credentials. Thank you very, very much for this. This will save me dozens of hours of tedious work. Thumbs up!!
Subodh Tiwari (Neeraj)Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015

Commented:
Thanks Andreas! Glad it worked as desired. :)
Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016

Commented:
Bill, I did some testing and as I said before your approach is working fine as well. Thank you very much for it.

Would you please be so kind as to consider the possible following formats in your expression as well as listed in the attached file.

I will look and update accordingly.


»bp
Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016

Commented:
So, in this case:

NEURO MICRO-MESH 1.5, D73MM, 0.6MM

how we know to exclude either of these:

73MM
1.5, D73MM



»bp
Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016

Commented:
Also, for this data item:

3D MESH 80 X 80 MM, 0,3 MM

how would we know not to extract

3 MM

and instead extract

80 X 80 MM

it seems like you are applying some "logic" when you identified what you wanted, but it's not clear to me what the rules are.  Can you true to come up with a set of statements that are the rules to be followed for the extraction?


»bp
Andreas HermleTeam leader

Author

Commented:
Ok Bill, thank you very much for your post, you are right, there are cases, where a certain logic cannot be applied and thereforethere it is hard for you to be precise in your coding. I will get back to you soon. Thank you very much for your superb support. 
Andreas HermleTeam leader

Author

Commented:
Bill everything, i.e. every measurement with MM attached (with or without space) is to be extracted, everything else is to be disregarded.
D73MM, 0.6MM
80 X 80 MM, 0,3 MM
(0,3 mm: in germany, we use the comma as decimal separator, some of the masterdata also has been entered with a period as decimal point, i.e the angloamerican way was used)
The masterdata that is to be worked on has been entered inconsistently and chaotic, so any of your solutions will never be perfect of course, so I am very happy with what I got from Neeraj and if you manage to tweak your code what I will get from you. I find both approaches very nice and you two deserve a lot of praise. Great!!!

Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Andreas HermleTeam leader

Author

Commented:
Hi Bob, thank you very much for it, will test it in a while, thank you very much for your great help.

I really highly appreciate your and Neeraj's expertise. 
Andreas HermleTeam leader

Author

Commented:
dear Bob, dear neeraj,

great Job, Bob, now with both approaches, I am getting exactly what I was looking for. What an expertise you two have.

To be honest with you, I am at a loss now how to award points. Neeraj was a bit quicker to answer. I was gonna award the points to him a couple of days ago when you, Bob, came along with a similar approach. I would like to award points to both of you of course.

What will be the 'fair/just' way to award points??
Subodh Tiwari (Neeraj)Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015

Commented:
Thanks Andreas! Glad we could help. :)
Well, that's not a big deal. You may award equal points to both of us. :)
Bill PrewTest your restores, not your backups...
CERTIFIED EXPERT
Expert of the Year 2019
Top Expert 2016

Commented:
fyi, Bob = Bill  😉
Andreas HermleTeam leader

Author

Commented:
hi Bill (I got that now ;-), hi Neeraj, Ok, I will do that. 
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.