?
Solved

Replace() with a twist...

Posted on 2003-03-15
27
Medium Priority
?
212 Views
Last Modified: 2010-05-01
I've been having problems with a function that's based around Replace.

Given a String, with multiple occurances of a sub-string beginning and sub-string ending, I need to replace all parts of the substring that are within a sub-string beginning and ending.

Sounds rough. It's not that bad, here's an example:

newStr = "[SubString]Hello this is a [/SubString] string"

That's fine.

newStr2 = "[SubString]Hello is[/SubString] is another [SubString] string [/SubString]"

That's also fine.

But this one is bad, and would require replacement:

newStr3 = "[SubString]This is[SubString] a [/SubString] bad String.[/SubString]"

If should be modified to look like this:

newStr3 = "[SubString]This is a bad String.[SubString]"

Basically, any time you're within a SubString, there can be no other SubString parts. You can assume that every SubString beginning has a matching SubString Ending.

Anyone wanna take a stab at it? I'd give more points, but 40 is all I have, and this is a not-for-profit organization that I'm already losing money on, and cannot afford more (sorry!).
0
Comment
Question by:Mistwolf
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 6
  • +3
27 Comments
 
LVL 101

Expert Comment

by:mlmcc
ID: 8144932
Since this sounds like a strange homework problem, what code do you have thus far?

Can you give a concrete example of what you want done or how it should look?
Will it really have [SubString] and [/SubString] in it?


mlmcc
0
 

Author Comment

by:Mistwolf
ID: 8144953
Heh... you probably don't want to see the code I have, however, you can just replace [SubString] and [/SubString] with [Quote] and [/Quote], and that's exactly what will be in the string. These are formatting tags, and I cannot allow a Quote within a Quote, as fun as that sounds =)

Here is a related function I wrote - it searches for a given tag, and replaces it with the appropriate formatting in HTML.

Function advancedReplace(strCodeBegin, intCodeBeginLength, strCodeEnd, intCodeEndLength, strCodeClose, strHTMLBegin, strHTMLEnd, strHTMLClose, strText)
  checkAbort = 1
  Do Until checkAbort = 0
      tagBegin = InStr(strText, strCodeBegin)
      If tagBegin = 0 Then
          checkAbort = 0
      Else
          tagEnd = InStr(tagBegin, strText, strCodeEnd)
          If tagEnd = 0 Then
              checkAbort = 0
          Else
              tagClose = InStr(tagEnd, strText, strCodeClose)
              If tagClose = 0 Then
                  checkAbort = 0
              Else
                  strText = Left(strText, tagBegin - 1) & strHTMLBegin & Mid(strText, tagBegin + intCodeBeginLength, tagEnd - tagBegin - intCodeBeginLength) & _
                  strHTMLEnd & Mid(strText, tagEnd + intCodeEndLength, tagClose - tagEnd - intCodeEndLength) & strHTMLClose & Mid(strText, tagClose + Len(strCodeClose))
              End If
          End If
      End If
   Loop
   advancedReplace = strText
End Function

It's called like this:

strMessage = Replace(strMessage,"[quote]","[br][br]<div class='QuoteText'><blockquote>[br]Quote:[br]<i>""")
strMessage = Replace(strMessage,"[/quote]","""</i>[br]</blockquote>[br]</div>")
0
 

Author Comment

by:Mistwolf
ID: 8144966
Ack, wrong calling lines there =)

Here is the one that goes with that function:

strMessage = advancedReplace("[link=", 6, "]", 1, "[/link]", "<a href='", "'>", "</a>", strMessage)

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Mistwolf
ID: 8145005
Broke down and bought points... hopefully someone can help me out.
0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8145166
Let me see what I can do.

mlmcc
0
 
LVL 3

Expert Comment

by:Shaka913
ID: 8145280
Essentially you are trying to parse out the substrings based on "tags" that you are surrounding the substrings with? if this is the case.

Please put in a few examples to the call that you want, and the before and after text. Maybe I'm missing what you are really trying to do... This doesn't sound hard to implement, just hard to understand.
0
 
LVL 3

Expert Comment

by:QJohnson
ID: 8145291
Let me re-phrase your question to make sure I understand it, if you please.

You want to inspect a text line (likely from HTML, XML, or some other tag-based syntax), and find violations of begin/end tag usage for a KNOWN/SPECIFIC tag.  Violations will be cured by removing the incorrect interior begin and end tags while leaving all the non-tag text in tact.  Is this right?  

Also, in your example there are pairs of tags and an equal number of each - their sequence is just incorrect.  Can we depend on this in all strings, or will it be possible that there are an odd number of tags and/or that there will be an unequal number of begin and end tags?  It also makes it appear as if we can depend on the last tag being an end tag - is this true?  What shall we do if the last one is a begin tag?  What shall we do if the first one is an end tag?

If there is a violation do you automatically want to reduce the whole string to a single begin/end pair, or are you anxious to surgically remove only the offenders.

For example:

The string we are given is:
   <Font>blah1 <Font>blah2 </Font>blah3 </Font>blah4 <Font>blah5 </Font>

Should we produce:
   <Font>blah1 blah2 blah3 blah4 blah5 </Font>  (i.e., remove all but the outermost begin/end tags)

or should we produce:
   <Font>blah1 blah2 blah3 </Font>blah4 <Font>blah5 </Font>     (i.e., remove only the interior offenders)




0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8145320
What you want is a function to validate that the string is OK, then you can use the normal Replace to replace all occurances of the strings.

mlmcc
0
 

Author Comment

by:Mistwolf
ID: 8145351
"You want to inspect a text line (likely from HTML, XML, or some other tag-based syntax), and find violations of begin/end tag usage for a KNOWN/SPECIFIC tag.  Violations will be cured by removing the incorrect interior begin and end tags while leaving all the non-tag text in tact.  Is this right?"

Exactly =) But it's not a well-known language such as HTML or XML. In HTML, you can do:
<font size=3><font face="Arial">Hello</font></font>

You cannot do this for my program. No tag can be embedded within itself. Given the above string, the function would produce:

<font size=3>Hello</font>

For the answer to your last question, the string:
<Font>blah1 <Font>blah2 </Font>blah3 </Font>blah4 <Font>blah5 </Font>

Should produce:
<Font>blah1 blah2 blah3 </Font>blah4 <Font>blah5 </Font>

You can count on the beginning tag always having a correseponding ending tag, so you don't need to worry about the first tag being an ending tag, or the last tag being a beginning tag. They will always have both sets of tags because they are generated using a button, and the button always inserts both tags.

You can make the beginning/ending tags a variable input to the function, or you can just hard-code them as [quote] and [/quote] - it doesn't matter to me, either way. The only thing I need this for is embedded quotes, but if I come across another need at a future date, I can probably make it a variable on my own.
0
 

Author Comment

by:Mistwolf
ID: 8145362
"What you want is a function to validate that the string is OK, then you can use the normal Replace to replace all occurances of the strings."

Not really. Knowing that a string is "OK" or not doesn't help me get rid of the erroneous tags.

In the example above, even if we knew this string:

"<Font>blah1 <Font>blah2 </Font>blah3 </Font>blah4 <Font>blah5 </Font>"

was erroneous, we'd still need a way to get rid of the bad tags.
0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8145434
Can I assume if there is a start tag then there will be an end tag in the string?

mlmcc
0
 
LVL 2

Expert Comment

by:navneet77
ID: 8145477
mistwolf, would the function know which tag it is going to replace.

0
 
LVL 101

Accepted Solution

by:
mlmcc earned 1200 total points
ID: 8145567
I think this will do it


Public Function subValidateStr(strText As String, strStart As String, strStop As String) As String

Dim intStart As Integer
Dim intStop As Integer
Dim intStartPrev As Integer
Dim intStopPrev As Integer
Dim strTextTemp As String
Dim boolError As Boolean

    strTextTemp = strText
    Do
        intStart = 0
        intStop = 0
        boolError = False
        Do
            intStartPrev = intStart
            intStopPrev = intStop
           
            intStart = InStr(intStart + 1, strTextTemp, strStart)
            intStop = InStr(intStop + 1, strTextTemp, strStop)
           
            If (intStop < intStart) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart))
                boolError = True
            ElseIf (intStart = 0) And (intStop > 0) Then
                strTextTemp = Left(strTextTemp, intStop - 1) + _
                                Mid(strTextTemp, intStop + Len(strStop))
                boolError = True
            ElseIf (intStart < intStopPrev) And (intStart <> 0) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart), intStopPrev - intStart - Len(strStart)) + _
                                Mid(strTextTemp, intStopPrev + Len(strStop))
                boolError = True
            End If
        Loop Until boolError Or ((intStop = 0) And (instart = 0))
    Loop Until intStop = 0 And intStart = 0
    subValidateStr = strTextTemp
   

End Function




Call is


strText = subValidateStr(strText, "STARTSym", "StopSym")


mlmcc
0
 
LVL 3

Expert Comment

by:QJohnson
ID: 8145778
Here's the design.  I'll provide code if you like the solution's design (and you don't want the fun of implementing it yourself).

We want to (a) perform a validation of the string and (b) fix it if it's wrong.

I'm going to use an integer array for my solution and will make it a static array of 10 values just to eliminate the nuisances of using dynamic arrays for data sets this small.  If we may encounter strings with more than ten pairs of tags, we would use a number large enough to handle them, instead, of course.

Premise:  Build a static array of start positions for the tags - first dimension holds Begin tags, second dimension holds End tags.

The code and this discussion will be a lot easier to follow if we agree to use some constants, so we declare them here.
Const TAG_BEGIN As Integer = 0
Const TAG_END   As Integer = 1

(A) - Validation
----------------
Rules for validation - only 2.

(1) Each Begin tag must preceed its End tag, i.e.:
      aintStartPos(x,TAG_BEGIN) < aintStartPos(x,TAG_END)

(2) Each Begin tag after the first must follow the End tag paired with the previous Begin tag.  Stated a bit differently, the start value for a subsequent Begin tag cannot fall before the End tag paired with the previous Begin tag as this would make two Begin tags in a row.  Stated yet a third way, a Begin tag cannot fall between a Begin tag and an End tag., i.e.:
     aintStartPos(x,TAG_BEGIN) > aintStartPos(x-1,TAG_END)

Three short loops are necessary for the validation processing - one to populate the array (keeping track of how many items are added so that our next two loops only iterate the proper number of elements of our fixed-size array), one to test rule one, and one to test rule two.

(B) - repair
-------------

For failures of rule one (which are likely impossible if the tags are inserted programmatically as I understand from above), we would have to encounter "BEEBBE" or worse(B=begin tag, E=end tag).  The failure would be:
       aintStartPos(1,TAG_BEGIN) > aintStartPos(1,TAG_END)
and the repair would be to remove the offending End tag that starts at aintStartPos(1,TAG_BEGIN) and the subsequent Begin tag at aintStartPs(1,TAG_END) - quite happily the two values used in the comparison.

For failures of rule two (the only ones I believe are actually possible), we would have to encounter "BBEEBE" or "BEBBEE" or something longer with more tags involved.  The failure would be (in the first case):
      aintStartPos(1,TAG_BEGIN) < aintStartPos(0,TAG_END)
(or in the second case):
      aintStartPos(2,TAG_BEGIN) < aintStartPos(1,TAG_END)
and the repair is to remove the offending Begin tag and the immediately subsequent End tag, which (happily again) are the two values used in the comparison.

We note here that the routine should probably be called in a loop until validation is confirmed.  A string with "BBBEEE" would require two passes (one for each insertion error), for example.  After the first pass, the second Begin tag and first End tags would have been removed and after the second pass, the third Begin tag and the second End tag would be removed - leaving just the first Begin tag and the last End tag.

Depending on how we structure the steps in our code, we may need to intitialize our array between validation steps, of course.

OK on the design?  

Want the code, too?
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8146138
This sounds like regular expressions to me ...

Include the regexp object in your code and try to replace it this way...

Let me try this matchstring .. i think it should work

oRegExp.pattern = "(\[(.*)\])(.*)(\[\1\])*(.*)(\[\\\1\])\[\\\1\]"

brb
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8146208
Ok like allways i messed my pattern up ;)

Here is a working version:

Info: test.txt contained your 1st post as example data. It will check only for 1 lvl nested tags .. so to make sure you dont have 3 same tags you need to check if there is still a match in the line...

Remove the last * if you are looking only for closed tags ...

Hope this helps
'Code
Private Sub Command1_Click()
Dim oRegExp As New RegExp
Dim cLine As String
Dim oMatches As MatchCollection
Dim oMatch As Match
With oRegExp
    .Global = True
    .MultiLine = True
    .IgnoreCase = True
End With
Set oStrmInput = oFso.OpenTextFile("c:\temp\test.txt")
cLine = oStrmInput.ReadAll
oRegExp.Pattern = "(\[(.*?)\])(.*?)(\[\2\])(.*?)(\[/\2\])(.*?)(\[/\2\])" '((.*)(\[\2\])*(.)*(\[\\\2\])*)\([\\\2\])*"
Set oMatches = oRegExp.Execute(cLine)
cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")
Debug.Print cLine

For Each oMatch In oMatches
    Dim i As Integer
    For i = 0 To oMatch.SubMatches.Count - 1
      Debug.Print oMatch.SubMatches(i)
    Next
    Debug.Print oMatch

Next

Debug.Print cLine

End Sub

'end code

'--------------DEBUG OUTPUT --------------
I've been having problems with a function that's based around Replace.

Given a String, with multiple occurances of a sub-string beginning and sub-string ending, I need to replace all parts of the substring that are within a sub-string beginning and ending.

Sounds rough. It's not that bad, here's an example:

newStr = "[SubString]Hello this is a [/SubString] string"

That's fine.

newStr2 = "[SubString]Hello is[/SubString] is another [SubString] string [/SubString]"

That's also fine.

But this one is bad, and would require replacement:

newStr3 = "[SubString]This is a  bad String.[/SubString]"

If should be modified to look like this:

newStr3 = "[SubString]This is a bad String.[SubString]"

Basically, any time you're within a SubString, there can be no other SubString parts. You can assume that every SubString beginning has a matching SubString Ending.

Anyone wanna take a stab at it? I'd give more points, but 40 is all I have, and this is a not-for-profit organization that I'm already losing money on, and cannot afford more (sorry!).

0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8146418
WHat I gave you will produce a valid string.

To have it also do the replacement

End Sub

Public Function subValidateStr(strText As String, strStart As String, strStop As String, _
                                                strRepStart As String, strRepStop As String) As String

Dim intStart As Integer
Dim intStop As Integer
Dim intStartPrev As Integer
Dim intStopPrev As Integer
Dim strTextTemp As String
Dim boolError As Boolean

    strTextTemp = strText
    Do
        intStart = 0
        intStop = 0
        boolError = False
        Do
            intStartPrev = intStart
            intStopPrev = intStop
           
            intStart = InStr(intStart + 1, strTextTemp, strStart)
            intStop = InStr(intStop + 1, strTextTemp, strStop)
           
            If (intStop < intStart) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart))
                boolError = True
            ElseIf (intStart = 0) And (intStop > 0) Then
                strTextTemp = Left(strTextTemp, intStop - 1) + _
                                Mid(strTextTemp, intStop + Len(strStop))
                boolError = True
            ElseIf (intStart < intStopPrev) And (intStart <> 0) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart), intStopPrev - intStart - Len(strStart)) + _
                                Mid(strTextTemp, intStopPrev + Len(strStop))
                boolError = True
            End If
        Loop Until boolError Or ((intStop = 0) And (instart = 0))
    Loop Until intStop = 0 And intStart = 0

    intStart = InStr(1, strTextTemp, strStart)
    intStop = InStr(1, strTextTemp, strStop)
    Do While intStart > 0
        strTextTemp = Left(strTextTemp, intStart - 1) + strRepStart + _
                        Mid(strTextTemp, intStart + Len(strStart), intStop - intStart - Len(strStart)) + _
                        strRepStop + Mid(strTextTemp, intStop + Len(strStop))
        intStart = InStr(1, strTextTemp, strStart)
        intStop = InStr(1, strTextTemp, strStop)
    Loop
    subValidateStr = strTextTemp
   
End Function


Call is now

(Me.Text1.Text, "B", "b", "123", "789")


mlmcc
0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8146428
Sorry about that.  Call should be like

    YourStr = subValidateStr(YourStr, "B", "b", "123", "789")

mlmcc
0
 

Author Comment

by:Mistwolf
ID: 8146462
mlmcc - I don't understand the call to the function you said would replace - why do I need to pass it 5 args? It should only need 3 args - string, tagBegin, tagEnd...

navneet77 - preferrably it would take the tag it will be replacing as a argument to the function.

rdrunner - I thought about using regEx for a while myself, but I just don't know enough about them. Your code looks very complex, but you've shown it works for 1 level nested tags. I don't think there will be 2999 level nested tags, but there is a chance there will be more than one. If you could set it up as a loop, with a string input (instead of file) it would probably suit my needs!

QJohnson - you have the most interesting approach, and I'd really love to see your code! I don't think there will be more than 10 nested tags, so an array of size 10 would be fine.

Thanks guys for all your help, hopefully we can get something working! *crosses fingers*
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8146545
My code will work for any tag there is... Thats the beauty of regular expressions ;)

You can modify my code quite easy to clean all tags away...

the only working part in my code is this snippet ...

oRegExp.Pattern = "(\[(.*?)\])(.*?)(\[\2\])(.*?)(\[/\2\])(.*?)(\[/\2\])"
cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")

The pattern is the rule you gave me ....

Let me break it up into english or at least try

(\[(.*?)\]) -> find a [tag] or a [sam] or a [sample] or [dsfjkahkdajs]
The inner () tells it to remember what it just found...

(.*?) -> find any text  . = wildcard ; * = 0 or more times: *? = zero or more but only as many as needed

(\[\2\]) -> find the 1st tag again ... \2 is what we remembered above...

so if we found [sam] it will look for another [sam]

(.*?) Some text again ....

(\[/\2\]) find [/SAM] or whatever we had in the 2nd ()

(.*?) Some text again ....

(\[/\2\]) find [/SAM] or whatever we had in the 2nd ()


Now the replace ..

cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")

Put everything together and leave out 2nd () , 4th () , 6th () matches (this leaves only the text)

Hope this helps ;)
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8146548
Ps to loop it try this snipped


while oregexp.test (cline)
    cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")
wend

0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8146593

'
'  strText - String to fix
'  strStart - Starting string
'  strStop - Terminating string
'  strRepStart - Replacement string for the start string
'  strRepStop - Replacement string for the terminating string
'

Public Function subValidateStr(strText As String, strStart As String, strStop As String, _
                                               strRepStart As String, strRepStop As String) As String
This will first produce a valid string then replace the beginning and terminating strings with the appropriate replacement strings.  I thought that was the gist of what you wanted.


If all you want to do is create a valid string use the first answer that has only 3 parameters.  It will produce a valid string from an invalid one.

Public Function subValidateStr(strText As String, strStart As String, strStop As String) As String

mlmcc
0
 

Author Comment

by:Mistwolf
ID: 8146640
mlmcc, The first one (3 inputs) works great =) I ran a test, and this was the output:

OK = [quote]hey what's[/quote] up? [quote] this string [/quote] is good.
Bad = [quote]hey what's[quote] up? [/quote] this string [/quote] is bad.
Very Bad = [quote]hey what's[quote] up[quote]?[/quote] [/quote] this string [/quote] is very bad.[/quote]
Fubar = [/quote]hey [quote][/quote] what's[quote][/quote] up? [/quote][quote][quote][quote][/quote] this string [/quote] [/quote] is fubar.[quote]

OK = [quote]hey what's[/quote] up? [quote] this string [/quote] is good.
Bad = [quote]hey what's up? this string [/quote] is bad.
Very Bad = [quote]hey what's up? this string [/quote] is very bad.
Fubar = hey what's up? this string is fubar.

Thanks.

rdrunner - I'm going to use mlmcc's answer because I can understand it (hehe), but I'm going to make a "points for rdrunner" question for you, because you deserve them =)
0
 
LVL 3

Expert Comment

by:QJohnson
ID: 8146716
Sorry I had to go to bed last evening and didn't see the request for code until almost three hours after you posted it.  I can understand your reluctance to wait any longer (particularly when you have some code that works for you!)

Good luck - hope you get SOME TIME off this weekend. <g>

Q
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8146788
Try to understand regular expressions if you have to mess with text some more ;)

they are realllly great!

0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8146963
P.s: I just threw the fubar string into my (modified with loop) function to test what it would toss out...

here is the result...

[/quote]hey [quote][/quote] what's up? [/quote][quote] this string  [/quote] is fubar.[quote]

I would say it fits the requirements ;) No longer any nested tags...
0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8147147
Glad I could help

mlmcc
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code o…
When trying to find the cause of a problem in VBA or VB6 it's often valuable to know what procedures were executed prior to the error. You can use the Call Stack for that but it is often inadequate because it may show procedures you aren't intereste…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses
Course of the Month9 days, 19 hours left to enroll

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question