Regexp Replace() of repeated field delimiters

I was trying to clean up a string of delimited text with the (vbscript.regexp) regular expression Replace() method.  I had to repeat the method in order to correctly process all the fields.  I wondered if there was a pattern that can correctly replace sequential delimiters.

Example:
If I have a string like this:
ABC^DEF^^GHI^^^JKL

Open in new window

and I want to supply missing field values, the resulting string should look like this:
ABC^DEF^#MI^GHI^#MI^#MI^JKL

Open in new window


However, the pattern I'm using only produces the following:
ABC^DEF^#MI^GHI^#MI^^JKL

Open in new window

because the delimiters between GHI and JKL get 'paired', preventing the second missing field from getting its value in a single method invocation.
LVL 47
aikimarkAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Rey Obrero (Capricorn1)Commented:
try using two replace() function

?Replace(replace("ABC^DEF^^GHI^^^JKL^^^^^^MNO","^^","^#MI^"),"^^","^#MI^")
ABC^DEF^#MI^GHI^#MI^#MI^JKL^#MI^#MI^#MI^#MI^#MI^MNO
0
aikimarkAuthor Commented:
@Rey

I know I can do it with multiple replace operations.  The question is whether there is a single pattern that would facilitate a correct operation with a single method invocation.
0
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
Not sure if this is the right method, but see if this works for you.....
Assuming your string is in A2.

Sub ReplaceMissingString()
Dim missStr As String
Dim RE As VBScript_RegExp_10.RegExp
Dim match As VBScript_RegExp_10.match
Dim matches As VBScript_RegExp_10.MatchCollection

Set RE = New VBScript_RegExp_10.RegExp
missStr = "^#MI^"
RE.Global = True
RE.Pattern = "\^\^"

    If RE.Test(Range("A2").Value) Then
        Set matches = RE.Execute(Range("A2").Value)
        For Each match In matches
            Range("A2").Value = RE.Replace(Range("A2").Value, missStr)
        Next match
    End If
End Sub

Open in new window

0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Robert SchuttSoftware EngineerCommented:
You can use a look-ahead: the second circumflex in the pattern below needs to match but is not included in the matched value (hence also don't include a second one in the replace value):
Sub TestReplace()

    Dim msg As String
    msg = "ABC^DEF^^GHI^^^JKL"

    Dim re As New RegExp
    re.Global = True
    re.Pattern = "\^(?=\^)"

    msg = re.Replace(msg, "^#MI")

    MsgBox msg

End Sub

Open in new window

0
hieloCommented:
Try:
Dim str, result
str = "ABC^DEF^^GHI^^^JKL"
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Global = True
myRegExp.Pattern = "\^\B"   ' \B= Non-Word Boundary

result = myRegExp.replace(str, "^#MI")

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
aikimarkAuthor Commented:
@sktneer

That will have to be repeated in order to produce the correct string when there are three or more consecutive delimiters.
0
aikimarkAuthor Commented:
@hielo

That's very promising.  Let me throw that into my test script.
0
aikimarkAuthor Commented:
@Robert

I didn't think look-a-heads were supported.  Looks like I'm going to learn some good stuff from this question.
0
Robert SchuttSoftware EngineerCommented:
I'm not sure about your environment. I used Excel VBA and added a reference to RegEx 5.5
0
Bill PrewCommented:
LOOKAHEAD is supported in VBScript.

~bp
0
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
Correct. I ended up with what hielo suggested. That will do the trick.
0
Bill PrewCommented:
Did a little testing here for curiosity.  The first non-lookahead approach seems to handle a trailing delimiter on the input string, so that seems better to me:

Input = "ABC^"
sktneer = "ABC^#MI"
schutt = "ABC^"


However neither handle a leading delimiter in the input string, for example:

Input = "^ABC"
sktneer = "^ABC"
schutt = "^ABC"


Not sure if that is a requirement.

~bp
0
Bill PrewCommented:
I do also note a difference with "special characters" in existing values, not sure what your requirement is there.

Input = "ABC^^!^DEF"
sktneer = "ABC^#MI^#MI!^DEF"
schutt = "ABC^#MI^!^DEF"


~bp
0
Robert SchuttSoftware EngineerCommented:
@Bill, yes, I was just about to post that. Word boundary includes more than ^ character. (Do you mean hielo instead of sktneer by the way?)

Beginning of line never occurred to me because it looks like ANSI document where that could never happen (although I can't be sure). End of line match is a very good catch, seems more of a practical possibility, it could be added in my code with:
re.Pattern = "\^(?=\^|$)"

Open in new window

0
aikimarkAuthor Commented:

Test Results

Nothing like exposing solutions to actual data.  Even though the actual data is all numeric data (or empty), I included both alphabetic and numeric fields in my tests of these two patterns.  In addition to the all-internal delimiters, I included trailing and leading delimiter configurations.
Test Data
ABC^DEF^^GHI^^^ALLINTERNAL
1^2^^4^^^7
ABC^DEF^^GHI^^^TRAILING^
1^2^^4^^^7^
^ABC^DEF^^GHI^^^LEADING
^2^3^^5^^^8
^ABC^DEF^^GHI^^^BOTHLEADINGANDTRAILING^
^2^3^^5^^^8^

Open in new window


1. hielo pattern results

ABC^DEF^#MI^GHI^#MI^#MI^ALLINTERNAL
1^2^#MI^4^#MI^#MI^7
ABC^DEF^#MI^GHI^#MI^#MI^TRAILING^#MI
1^2^#MI^4^#MI^#MI^7^#MI
^ABC^DEF^#MI^GHI^#MI^#MI^LEADING
^2^3^#MI^5^#MI^#MI^8
^ABC^DEF^#MI^GHI^#MI^#MI^BOTHLEADINGANDTRAILING^#MI
^2^3^#MI^5^#MI^#MI^8^#MI

Open in new window

2. Robert Schutt pattern results

ABC^DEF^#MI^GHI^#MI^#MI^ALLINTERNAL
1^2^#MI^4^#MI^#MI^7
ABC^DEF^#MI^GHI^#MI^#MI^TRAILING^
1^2^#MI^4^#MI^#MI^7^
^ABC^DEF^#MI^GHI^#MI^#MI^LEADING
^2^3^#MI^5^#MI^#MI^8
^ABC^DEF^#MI^GHI^#MI^#MI^BOTHLEADINGANDTRAILING^
^2^3^#MI^5^#MI^#MI^8^

Open in new window

=============================
Based on these results the hielo results also correctly transform a trailing delimiter, which would improve its performance.

I'm going to give you and any additional EE experts a few more hours to see if they can better hielo's pattern.
0
aikimarkAuthor Commented:
This revised version of the Robert Schutt pattern (http:#a40911944)
\^(?=\^|$|\r\n)

Open in new window

produced the following results:
ABC^DEF^#MI^GHI^#MI^#MI^ALLINTERNAL
1^2^#MI^4^#MI^#MI^7
ABC^DEF^#MI^GHI^#MI^#MI^TRAILING^#MI
1^2^#MI^4^#MI^#MI^7^#MI
^ABC^DEF^#MI^GHI^#MI^#MI^LEADING
^2^3^#MI^5^#MI^#MI^8
^ABC^DEF^#MI^GHI^#MI^#MI^BOTHLEADINGANDTRAILING^#MI
^2^3^#MI^5^#MI^#MI^8^#MI

Open in new window

Which takes care of trailing delimiters, like the hielo pattern.
0
Bill PrewCommented:
Do you mean hielo instead of sktneer by the way?
Yes, apologies hielo!

~bp
0
Robert SchuttSoftware EngineerCommented:
Ok, so I guess you don't expect special characters in the fields which would make my solution necessary. As it is, my solution may be significantly slower on larger documents (but haven't tested).

Some more additions to both search and replace patterns for the beginning of line:
Sub TestReplace()

    Dim msg As String
    msg = _
        "ABC^DEF^^GHI^^^ALLINTERNAL" & vbCrLf & _
        "1^2^^4^^^7" & vbCrLf & _
        "ABC^DEF^^GHI^^^TRAILING^" & vbCrLf & _
        "1^2^^4^^^7^" & vbCrLf & _
        "^ABC^DEF^^GHI^^^LEADING" & vbCrLf & _
        "^2^3^^5^^^8" & vbCrLf & _
        "^ABC^DEF^^GHI^^^BOTHLEADINGANDTRAILING^" & vbCrLf & _
        "^2^3^^5^^^8^" & vbCrLf

    Dim re As New RegExp
    re.Global = True
    re.MultiLine = True
    re.Pattern = "(^|\^)(?=\^|\r\n)"

    msg = re.Replace(msg, "$1#MI") & vbCrLf

    MsgBox msg

End Sub

Open in new window


I had to remove $ because it added an extra line with a 'missing' field after every line, seems like something to do with the CR/LF combination. So now the only thing that doesn't work is a last line in a file without CR/LF.

output:
ABC^DEF^#MI^GHI^#MI^#MI^ALLINTERNAL
1^2^#MI^4^#MI^#MI^7
ABC^DEF^#MI^GHI^#MI^#MI^TRAILING^#MI
1^2^#MI^4^#MI^#MI^7^#MI
#MI^ABC^DEF^#MI^GHI^#MI^#MI^LEADING
#MI^2^3^#MI^5^#MI^#MI^8
#MI^ABC^DEF^#MI^GHI^#MI^#MI^BOTHLEADINGANDTRAILING^#MI
#MI^2^3^#MI^5^#MI^#MI^8^#MI

Open in new window

0
aikimarkAuthor Commented:
@Robert

I still have to do a performance test of these patterns.  I expect that the simpler of the two patterns would have the performance advantage.  The performance tests are only operating on a line at a time, so the \r\n version of your pattern won't be necessary.
0
aikimarkAuthor Commented:

Performance test results

Process zero valued key's string: ^^103^^^106^^108^109^^^
the number of times there are items in the set.
 1.546875     Regexp 1001   #MI^#MI^103^#MI^#MI^106^#MI^108^109^#MI^#MI^#MI
 0.5          hielo 1001    #MI^#MI^103^#MI^#MI^106^#MI^108^109^#MI^#MI^#MI
 0.5703125    RS 1001       #MI^#MI^103^#MI^#MI^106^#MI^108^109^#MI^#MI^#MI
 0.453125     VB 1001       #MI^#MI^103^#MI^#MI^106^#MI^108^109^#MI^#MI^#MI

Open in new window

Process all items in the set.
 2.410156     Regexp        #MI^102^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI
 1.734375     Regexp hielo  #MI^102^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI
 1.75         Regexp RS     #MI^102^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI
 1.910156     VB            #MI^102^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI^#MI

Open in new window

Notes:
* The string in the all-items run is the last item in the set.
* The Regexp block iterates a "\^\^" pattern until all repeated delimiters are transformed.  Also, leading and trailing delimiters are transformed, if necessary.
* The VB block iterates a VB Replace() function until all repeated delimiters, "^^" are transformed.  Also, leading and trailing delimiters are transformed, if necessary.
0
aikimarkAuthor Commented:
Congratulations to hielo for the faster-performing pattern.

Also, kudos to Robert Schutt for his runner-up pattern.

Thanks to all experts participating in the thread.  In addition to getting my question answered, I learned some things about the vbscript.regexp engine and patterns (non-word boundaries and look-ahead patterns)

If you are interested, I did a performance comparison article, comparing Regexp and VB Replace() in this article:
http://rdsrc.us/ipF4nT
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.