Solved

Regex for vba in taking dashes out of string  in special cases

Posted on 2014-03-19
24
494 Views
Last Modified: 2014-03-22
need a regex to keep a dash on certain  terms


Keep:
12-20
2-10
3-55
32-32
100-100
1-100

Also keep:
1/2-20
1/4-20
1/4-2
1/3-35
1/8-10

Get rid of the dash
if  before ot after the dash there is a alpha character:

before:
A-1195-004-AS
AT-11-4
AT-11-400
TR-1151-404
55-85
F-16441
AK - 5230
AK- 52309 -  500
RYB- 921 - 002
VB-9211-0-5-13
RYB-922-12

After:
A1195004AS
AT114
AT11400
TR1151404
5585
F16441
AK5230
AK52309500
RYB921002
VB92110513
RYB92212

this works on the above somewhat: but not on the fractions
Cd = Regex.Replace(cD), _
                   "(-|\x20)+", "")

Open in new window




Thanks
fordraiders
0
Comment
Question by:fordraiders
  • 9
  • 8
  • 7
24 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39941531
This will work on all your test strings, except on 55-85, where I do not see a reason to remove the -:
@"([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?"

Open in new window

Replace with $1$2$3$4

HTH,
Dan
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942394
@fordraiders

How is "55-85" different than "12-20" or "32-32"?

I'm thinking that the pattern might be simplified if the string first went through a VB Replace() function to remove any space characters before entering the regexp.replace() method.  Are you coding this in a user-defined function with VBA?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942436
@Dan

Is your pattern for the VBA (vbscript.regexp) environment or the .Net regex engine?
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39942501
It's .Net.
Should work in VBA too, as it has only basic constructs.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942527
Note: The leading "@" character will cause a compilation error in the VBA environment.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39942560
Yup. I think it should be something like this in VBA:
Cd = Regex.Replace(cD, "([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?", "$1$2$3$4")

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942573
I broke it down into three different patterns and iterated the patterns.
Public Sub testit()
    Dim oRE As Object
    Dim strText As String
    Dim colPatterns As New Collection
    Dim vPattern As Variant
    colPatterns.Add "(^|\n.*?[A-Z].*?)-(.*?[A-Z].*?\r|$)"
    colPatterns.Add "(^|\n.*?[A-Z].*?)-(.*?\r|$)"
    colPatterns.Add "(^|\n.*?)-(.*?[A-Z].*?\r|$)"
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    strText = strText & "12-20" & vbCrLf
    strText = strText & "2-10" & vbCrLf
    strText = strText & "3-55" & vbCrLf
    strText = strText & "32-32" & vbCrLf
    strText = strText & "100-100" & vbCrLf
    strText = strText & "1-100" & vbCrLf
    strText = strText & "1/2-20" & vbCrLf
    strText = strText & "1/4-20" & vbCrLf
    strText = strText & "1/4-2" & vbCrLf
    strText = strText & "1/3-35" & vbCrLf
    strText = strText & "1/8-10" & vbCrLf
    strText = strText & "A-1195-004-AS" & vbCrLf
    strText = strText & "AT-11-4" & vbCrLf
    strText = strText & "AT-11-400" & vbCrLf
    strText = strText & "TR-1151-404" & vbCrLf
    strText = strText & "55-85" & vbCrLf
    strText = strText & "F-16441" & vbCrLf
    strText = strText & "AK-5230" & vbCrLf
    strText = strText & "AK-52309-500" & vbCrLf
    strText = strText & "RYB-921-002" & vbCrLf
    strText = strText & "VB-9211-0-5-13" & vbCrLf
    strText = strText & "RYB-922-12" & vbCrLf
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do
            strText = oRE.Replace(strText, "$1$2")
        Loop While oRE.test(strText)
    Next
    Debug.Print strText
End Sub

Open in new window


Assuming you want to package this into a user-defined function, you might do something like the following.  Unlike the test, I assumed that each cell in your worksheet would contain a single string value.  This is why the patterns are different.
Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
    
    strText = Replace(parmText, " ", vbNullString)
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?)-(.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?)-(.*?)"
        colPatterns.Add "(.*?)-(.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39942586
sorry guys my mistake...!!!

55-85 should not be in there...
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39942599
@Dan

That pattern does seem to work in the VBA environment.  Here's my test:
Public Function Dan(parmText)
    Static oRE As Object
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
    End If
    oRE.Pattern = "([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?"
    Dan = oRE.Replace(parmText, "$1$2$3$4")

End Function

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944416
dan,

if in the text box i had :

CAP SCREW 1/4-20 4444-KB4

Your code is keeping it as is.
Should be:

CAP SCREW 1/4-20 4444KB4
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944428
this is deleting spaces in between tokens which i need to save.


Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
   
    strText = Replace(parmText, " ", vbNullString)
   
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?)-(.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?)-(.*?)"
        colPatterns.Add "(.*?)-(.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39944438
Yup.

It only removes the dash if the expression starts with a letter, as in your test data.

I think I need an "if" to match a letter both before and after the -. I'll play a bit with it and see what gives.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 3

Author Comment

by:fordraiders
ID: 39944450
dan, Getting a wrong number of arguements
on this line:
  Static oRE As Object
       
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
end if


Cd = oRe.Replace(cD, "([A-Z]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?", "$1$2$3$4")
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944455
ok, I should have added things like...

444-k232
234k3-kb232
2321-k543


go to:
444k232
234k3kb232
2321k543
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39944467
If there's only one group on a line, it's easy:
(?(?=(?:.*[A-Z]\s*-|.*-\s*[A-Z]))([A-Z0-9]+)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?)

This will work for all your text cases, except for CAP SCREW 1/4-20 4444-KB4, where it will remove both "-", because there exists a -K on the line and the replacement is applied to both 1/4-20 and 4444-KB4.

PS: not a VBA developer, so I looked on the manual (http://msdn.microsoft.com/en-us/library/e7f5w83z(v=vs.110).aspx) and there is a variant of regex.replace with 3 arguments. If it does not work, use the variant with 2 arguments.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944504
@Dan
there is a variant of regex.replace with 3 arguments
Please note the .NET Framework 4.5 below the title.  The regex object used in the VBA/VBScript environment only has two parameters for the Replace() method.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944514
@fordraiders

Getting a wrong number of arguements
This is code that I posted to test Dan's regex pattern in the VBA environment.  Since it worked for me, please tell me something more detailed about the error you are receiving.

this is deleting spaces
Yes.  When I wrote that, I commented that the pattern would be greatly simplified if the spaces were removed from the string prior to the regex replacement operation(s).
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944522
I've tweaked the patterns I used to preserve leading or trailing blanks next to the dash we are removing.  Please test this version.
Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
    
    strText = Replace(parmText, " ", vbNullString)
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?)"
        colPatterns.Add "(.*?\s*)-(\s*.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944525
aikimark,

using

cap screw 1/4-20 3343-k43

needs to be:
cap<space>screw<space>1/4-20<space>3343k43
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39944528
aikimark,

using

cap screw 1/4-20 3343-K43

YOUR CODE:

aikimark,

using

cap screw 1/420 3343k43
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39944540
In your question text, you stated:
Get rid of the dash
if  before ot after the dash there is a alpha character:

My replacements are acting in accordance with that rule.  There are alphabetic characters before and after the hyphen characters ("cap screw" and "K").
cap screw 1/4-20 3343-K43

===========================
I realized that I needed to remove the VB Replace function in my updated routine.  Please test this version.
Public Function Q_28392904(parmText)
    Static oRE As Object
    Static colPatterns As Object
    Dim strText As String
    Dim vPattern As Variant
    
    strText = parmText
    
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = True
        Set colPatterns = New Collection
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?[A-Z].*?)"
        colPatterns.Add "(.*?[A-Z].*?\s*)-(\s*.*?)"
        colPatterns.Add "(.*?\s*)-(\s*.*?[A-Z].*?)"
    End If
    For Each vPattern In colPatterns
        oRE.Pattern = vPattern
        Do While oRE.test(strText)
            strText = oRE.Replace(strText, "$1$2")
        Loop
    Next
    Q_28392904 = strText
End Function

Open in new window

0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39944545
Actually, if I stop thinking like a programmer, it becomes easy:
(?:([A-Z]+[0-9]*)\s*-\s*(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?(?:([A-Z0-9]*)\s*-\s*)?)|([0-9]+)\s*-\s*([A-Z]+[0-9]*)(?:\s*-\s*([A-Z0-9]*))?(?:\s*-\s*([A-Z0-9]*))?

Open in new window

replace with
$1$2$3$4$5$6$7$8

Open in new window

Works for:
12-20
1/2-20
A-1195-004-AS
AK - 5230
AK- 52309 -  500
RYB- 921 - 002
VB-9211-0-5-13
RYB-922-12
CAP SCREW 1/4-20 4444-KB4
444-K232
234K3-KB232
2321-K543
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39947053
worked great Thanks
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39947059
Glad I could help!
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

A2 = A1 That kind of cell reference is relative.  If you copy it from A2 to B2, then B2 will get this: B2 = B1 That's all fine and good, but if you then insert a new row above row 2, you'll find: A3 = A1 B3 = B1 This is intentional. …
Introduction This Article is a follow-up to my Mappit! Addin Article (http://www.experts-exchange.com/A_2613.html), it was inspired by an email posting I made to EUSPRIG (http://www.eusprig.org/index.htm), I will briefly cover: 1) An overvie…
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now