Solved

Regex for vba in taking dashes out of string  in special cases

Posted on 2014-03-19
24
519 Views
Last Modified: 2014-03-22
need a regex to keep a dash on certain  terms


Keep:
12-20
2-10
3-55
32-32
100-100
1-100

Also keep:
1/2-20
1/4-20
1/4-2
1/3-35
1/8-10

Get rid of the dash
if  before ot after the dash there is a alpha character:

before:
A-1195-004-AS
AT-11-4
AT-11-400
TR-1151-404
55-85
F-16441
AK - 5230
AK- 52309 -  500
RYB- 921 - 002
VB-9211-0-5-13
RYB-922-12

After:
A1195004AS
AT114
AT11400
TR1151404
5585
F16441
AK5230
AK52309500
RYB921002
VB92110513
RYB92212

this works on the above somewhat: but not on the fractions
Cd = Regex.Replace(cD), _
                   "(-|\x20)+", "")

Open in new window




Thanks
fordraiders
0
Comment
Question by:fordraiders
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
  • 7
24 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39941531
This will work on all your test strings, except on 55-85, where I do not see a reason to remove the -:
@"([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?"

Open in new window

Replace with $1$2$3$4

HTH,
Dan
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942394
@fordraiders

How is "55-85" different than "12-20" or "32-32"?

I'm thinking that the pattern might be simplified if the string first went through a VB Replace() function to remove any space characters before entering the regexp.replace() method.  Are you coding this in a user-defined function with VBA?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942436
@Dan

Is your pattern for the VBA (vbscript.regexp) environment or the .Net regex engine?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39942501
It's .Net.
Should work in VBA too, as it has only basic constructs.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942527
Note: The leading "@" character will cause a compilation error in the VBA environment.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39942560
Yup. I think it should be something like this in VBA:
Cd = Regex.Replace(cD, "([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?", "$1$2$3$4")

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942573
I broke it down into three different patterns and iterated the patterns.
Public Sub testit()
    Dim oRE As Object
    Dim strText As String
    Dim colPatterns As New Collection
    Dim vPattern As Variant
    colPatterns.Add "(^|\n.*?[A-Z].*?)-(.*?[A-Z].*?\r|$)"
    colPatterns.Add "(^|\n.*?[A-Z].*?)-(.*?\r|$)"
    colPatterns.Add "(^|\n.*?)-(.*?[A-Z].*?\r|$)"
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    strText = strText & "12-20" & vbCrLf
    strText = strText & "2-10" & vbCrLf
    strText = strText & "3-55" & vbCrLf
    strText = strText & "32-32" & vbCrLf
    strText = strText & "100-100" & vbCrLf
    strText = strText & "1-100" & vbCrLf
    strText = strText & "1/2-20" & vbCrLf
    strText = strText & "1/4-20" & vbCrLf
    strText = strText & "1/4-2" & vbCrLf
    strText = strText & "1/3-35" & vbCrLf
    strText = strText & "1/8-10" & vbCrLf
    strText = strText & "A-1195-004-AS" & vbCrLf
    strText = strText & "AT-11-4" & vbCrLf
    strText = strText & "AT-11-400" & vbCrLf
    strText = strText & "TR-1151-404" & vbCrLf
    strText = strText & "55-85" & vbCrLf
    strText = strText & "F-16441" & vbCrLf
    strText = strText & "AK-5230" & vbCrLf
    strText = strText & "AK-52309-500" & vbCrLf
    strText = strText & "RYB-921-002" & vbCrLf
    strText = strText & "VB-9211-0-5-13" & vbCrLf
    strText = strText & "RYB-922-12" & vbCrLf
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do
            strText = oRE.Replace(strText, "$1$2")
        Loop While oRE.test(strText)
    Next
    Debug.Print strText
End Sub

Open in new window


Assuming you want to package this into a user-defined function, you might do something like the following.  Unlike the test, I assumed that each cell in your worksheet would contain a single string value.  This is why the patterns are different.
Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
    
    strText = Replace(parmText, " ", vbNullString)
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?)-(.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?)-(.*?)"
        colPatterns.Add "(.*?)-(.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39942586
sorry guys my mistake...!!!

55-85 should not be in there...
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942599
@Dan

That pattern does seem to work in the VBA environment.  Here's my test:
Public Function Dan(parmText)
    Static oRE As Object
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
    End If
    oRE.Pattern = "([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?"
    Dan = oRE.Replace(parmText, "$1$2$3$4")

End Function

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944416
dan,

if in the text box i had :

CAP SCREW 1/4-20 4444-KB4

Your code is keeping it as is.
Should be:

CAP SCREW 1/4-20 4444KB4
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944428
this is deleting spaces in between tokens which i need to save.


Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
   
    strText = Replace(parmText, " ", vbNullString)
   
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?)-(.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?)-(.*?)"
        colPatterns.Add "(.*?)-(.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39944438
Yup.

It only removes the dash if the expression starts with a letter, as in your test data.

I think I need an "if" to match a letter both before and after the -. I'll play a bit with it and see what gives.
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944450
dan, Getting a wrong number of arguements
on this line:
  Static oRE As Object
       
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
end if


Cd = oRe.Replace(cD, "([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?", "$1$2$3$4")
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944455
ok, I should have added things like...

444-k232
234k3-kb232
2321-k543


go to:
444k232
234k3kb232
2321k543
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39944467
If there's only one group on a line, it's easy:
(?(?=(?:.*[A-Z]\s*-|.*-\s*[A-Z]))([A-Z0-9]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?)

This will work for all your text cases, except for CAP SCREW 1/4-20 4444-KB4, where it will remove both "-", because there exists a -K on the line and the replacement is applied to both 1/4-20 and 4444-KB4.

PS: not a VBA developer, so I looked on the manual (http://msdn.microsoft.com/en-us/library/e7f5w83z(v=vs.110).aspx) and there is a variant of regex.replace with 3 arguments. If it does not work, use the variant with 2 arguments.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944504
@Dan
there is a variant of regex.replace with 3 arguments
Please note the .NET Framework 4.5 below the title.  The regex object used in the VBA/VBScript environment only has two parameters for the Replace() method.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944514
@fordraiders

Getting a wrong number of arguements
This is code that I posted to test Dan's regex pattern in the VBA environment.  Since it worked for me, please tell me something more detailed about the error you are receiving.

this is deleting spaces
Yes.  When I wrote that, I commented that the pattern would be greatly simplified if the spaces were removed from the string prior to the regex replacement operation(s).
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944522
I've tweaked the patterns I used to preserve leading or trailing blanks next to the dash we are removing.  Please test this version.
Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
    
    strText = Replace(parmText, " ", vbNullString)
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?)"
        colPatterns.Add "(.*?\s*)-(\s*.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944525
aikimark,

using

cap screw 1/4-20 3343-k43

needs to be:
cap<space>screw<space>1/4-20<space>3343k43
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944528
aikimark,

using

cap screw 1/4-20 3343-K43

YOUR CODE:

aikimark,

using

cap screw 1/420 3343k43
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944540
In your question text, you stated:
Get rid of the dash
if  before ot after the dash there is a alpha character:

My replacements are acting in accordance with that rule.  There are alphabetic characters before and after the hyphen characters ("cap screw" and "K").
cap screw 1/4-20 3343-K43

===========================
I realized that I needed to remove the VB Replace function in my updated routine.  Please test this version.
Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
    
    strText = parmText
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?)"
        colPatterns.Add "(.*?\s*)-(\s*.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function

Open in new window

0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39944545
Actually, if I stop thinking like a programmer, it becomes easy:
(?:([A-Z]+[0-9]*)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?)|([0-9]+)\s*-\s*([A-Z]+[0-9]*)(?:\s*-\s*([A-Z0-9]*))?(?:\s*-\s*([A-Z0-9]*))?

Open in new window

replace with
$1$2$3$4$5$6$7$8

Open in new window

Works for:
12-20
1/2-20
A-1195-004-AS
AK - 5230
AK- 52309 -  500
RYB- 921 - 002
VB-9211-0-5-13
RYB-922-12
CAP SCREW 1/4-20 4444-KB4
444-K232
234K3-KB232
2321-K543
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39947053
worked great Thanks
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39947059
Glad I could help!
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you use a spreadsheet like Microsoft's Excel?  Have you ever wanted to link out to a non excel file on your computer or network drive?  This is the way I found to do it!
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question