• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 215
  • Last Modified:

Replace() with a twist...

I've been having problems with a function that's based around Replace.

Given a String, with multiple occurances of a sub-string beginning and sub-string ending, I need to replace all parts of the substring that are within a sub-string beginning and ending.

Sounds rough. It's not that bad, here's an example:

newStr = "[SubString]Hello this is a [/SubString] string"

That's fine.

newStr2 = "[SubString]Hello is[/SubString] is another [SubString] string [/SubString]"

That's also fine.

But this one is bad, and would require replacement:

newStr3 = "[SubString]This is[SubString] a [/SubString] bad String.[/SubString]"

If should be modified to look like this:

newStr3 = "[SubString]This is a bad String.[SubString]"

Basically, any time you're within a SubString, there can be no other SubString parts. You can assume that every SubString beginning has a matching SubString Ending.

Anyone wanna take a stab at it? I'd give more points, but 40 is all I have, and this is a not-for-profit organization that I'm already losing money on, and cannot afford more (sorry!).
0
Mistwolf
Asked:
Mistwolf
  • 9
  • 7
  • 6
  • +3
1 Solution
 
mlmccCommented:
Since this sounds like a strange homework problem, what code do you have thus far?

Can you give a concrete example of what you want done or how it should look?
Will it really have [SubString] and [/SubString] in it?


mlmcc
0
 
MistwolfAuthor Commented:
Heh... you probably don't want to see the code I have, however, you can just replace [SubString] and [/SubString] with [Quote] and [/Quote], and that's exactly what will be in the string. These are formatting tags, and I cannot allow a Quote within a Quote, as fun as that sounds =)

Here is a related function I wrote - it searches for a given tag, and replaces it with the appropriate formatting in HTML.

Function advancedReplace(strCodeBegin, intCodeBeginLength, strCodeEnd, intCodeEndLength, strCodeClose, strHTMLBegin, strHTMLEnd, strHTMLClose, strText)
  checkAbort = 1
  Do Until checkAbort = 0
      tagBegin = InStr(strText, strCodeBegin)
      If tagBegin = 0 Then
          checkAbort = 0
      Else
          tagEnd = InStr(tagBegin, strText, strCodeEnd)
          If tagEnd = 0 Then
              checkAbort = 0
          Else
              tagClose = InStr(tagEnd, strText, strCodeClose)
              If tagClose = 0 Then
                  checkAbort = 0
              Else
                  strText = Left(strText, tagBegin - 1) & strHTMLBegin & Mid(strText, tagBegin + intCodeBeginLength, tagEnd - tagBegin - intCodeBeginLength) & _
                  strHTMLEnd & Mid(strText, tagEnd + intCodeEndLength, tagClose - tagEnd - intCodeEndLength) & strHTMLClose & Mid(strText, tagClose + Len(strCodeClose))
              End If
          End If
      End If
   Loop
   advancedReplace = strText
End Function

It's called like this:

strMessage = Replace(strMessage,"[quote]","[br][br]<div class='QuoteText'><blockquote>[br]Quote:[br]<i>""")
strMessage = Replace(strMessage,"[/quote]","""</i>[br]</blockquote>[br]</div>")
0
 
MistwolfAuthor Commented:
Ack, wrong calling lines there =)

Here is the one that goes with that function:

strMessage = advancedReplace("[link=", 6, "]", 1, "[/link]", "<a href='", "'>", "</a>", strMessage)

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
MistwolfAuthor Commented:
Broke down and bought points... hopefully someone can help me out.
0
 
mlmccCommented:
Let me see what I can do.

mlmcc
0
 
Shaka913Commented:
Essentially you are trying to parse out the substrings based on "tags" that you are surrounding the substrings with? if this is the case.

Please put in a few examples to the call that you want, and the before and after text. Maybe I'm missing what you are really trying to do... This doesn't sound hard to implement, just hard to understand.
0
 
QJohnsonCommented:
Let me re-phrase your question to make sure I understand it, if you please.

You want to inspect a text line (likely from HTML, XML, or some other tag-based syntax), and find violations of begin/end tag usage for a KNOWN/SPECIFIC tag.  Violations will be cured by removing the incorrect interior begin and end tags while leaving all the non-tag text in tact.  Is this right?  

Also, in your example there are pairs of tags and an equal number of each - their sequence is just incorrect.  Can we depend on this in all strings, or will it be possible that there are an odd number of tags and/or that there will be an unequal number of begin and end tags?  It also makes it appear as if we can depend on the last tag being an end tag - is this true?  What shall we do if the last one is a begin tag?  What shall we do if the first one is an end tag?

If there is a violation do you automatically want to reduce the whole string to a single begin/end pair, or are you anxious to surgically remove only the offenders.

For example:

The string we are given is:
   <Font>blah1 <Font>blah2 </Font>blah3 </Font>blah4 <Font>blah5 </Font>

Should we produce:
   <Font>blah1 blah2 blah3 blah4 blah5 </Font>  (i.e., remove all but the outermost begin/end tags)

or should we produce:
   <Font>blah1 blah2 blah3 </Font>blah4 <Font>blah5 </Font>     (i.e., remove only the interior offenders)




0
 
mlmccCommented:
What you want is a function to validate that the string is OK, then you can use the normal Replace to replace all occurances of the strings.

mlmcc
0
 
MistwolfAuthor Commented:
"You want to inspect a text line (likely from HTML, XML, or some other tag-based syntax), and find violations of begin/end tag usage for a KNOWN/SPECIFIC tag.  Violations will be cured by removing the incorrect interior begin and end tags while leaving all the non-tag text in tact.  Is this right?"

Exactly =) But it's not a well-known language such as HTML or XML. In HTML, you can do:
<font size=3><font face="Arial">Hello</font></font>

You cannot do this for my program. No tag can be embedded within itself. Given the above string, the function would produce:

<font size=3>Hello</font>

For the answer to your last question, the string:
<Font>blah1 <Font>blah2 </Font>blah3 </Font>blah4 <Font>blah5 </Font>

Should produce:
<Font>blah1 blah2 blah3 </Font>blah4 <Font>blah5 </Font>

You can count on the beginning tag always having a correseponding ending tag, so you don't need to worry about the first tag being an ending tag, or the last tag being a beginning tag. They will always have both sets of tags because they are generated using a button, and the button always inserts both tags.

You can make the beginning/ending tags a variable input to the function, or you can just hard-code them as [quote] and [/quote] - it doesn't matter to me, either way. The only thing I need this for is embedded quotes, but if I come across another need at a future date, I can probably make it a variable on my own.
0
 
MistwolfAuthor Commented:
"What you want is a function to validate that the string is OK, then you can use the normal Replace to replace all occurances of the strings."

Not really. Knowing that a string is "OK" or not doesn't help me get rid of the erroneous tags.

In the example above, even if we knew this string:

"<Font>blah1 <Font>blah2 </Font>blah3 </Font>blah4 <Font>blah5 </Font>"

was erroneous, we'd still need a way to get rid of the bad tags.
0
 
mlmccCommented:
Can I assume if there is a start tag then there will be an end tag in the string?

mlmcc
0
 
navneet77Commented:
mistwolf, would the function know which tag it is going to replace.

0
 
mlmccCommented:
I think this will do it


Public Function subValidateStr(strText As String, strStart As String, strStop As String) As String

Dim intStart As Integer
Dim intStop As Integer
Dim intStartPrev As Integer
Dim intStopPrev As Integer
Dim strTextTemp As String
Dim boolError As Boolean

    strTextTemp = strText
    Do
        intStart = 0
        intStop = 0
        boolError = False
        Do
            intStartPrev = intStart
            intStopPrev = intStop
           
            intStart = InStr(intStart + 1, strTextTemp, strStart)
            intStop = InStr(intStop + 1, strTextTemp, strStop)
           
            If (intStop < intStart) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart))
                boolError = True
            ElseIf (intStart = 0) And (intStop > 0) Then
                strTextTemp = Left(strTextTemp, intStop - 1) + _
                                Mid(strTextTemp, intStop + Len(strStop))
                boolError = True
            ElseIf (intStart < intStopPrev) And (intStart <> 0) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart), intStopPrev - intStart - Len(strStart)) + _
                                Mid(strTextTemp, intStopPrev + Len(strStop))
                boolError = True
            End If
        Loop Until boolError Or ((intStop = 0) And (instart = 0))
    Loop Until intStop = 0 And intStart = 0
    subValidateStr = strTextTemp
   

End Function




Call is


strText = subValidateStr(strText, "STARTSym", "StopSym")


mlmcc
0
 
QJohnsonCommented:
Here's the design.  I'll provide code if you like the solution's design (and you don't want the fun of implementing it yourself).

We want to (a) perform a validation of the string and (b) fix it if it's wrong.

I'm going to use an integer array for my solution and will make it a static array of 10 values just to eliminate the nuisances of using dynamic arrays for data sets this small.  If we may encounter strings with more than ten pairs of tags, we would use a number large enough to handle them, instead, of course.

Premise:  Build a static array of start positions for the tags - first dimension holds Begin tags, second dimension holds End tags.

The code and this discussion will be a lot easier to follow if we agree to use some constants, so we declare them here.
Const TAG_BEGIN As Integer = 0
Const TAG_END   As Integer = 1

(A) - Validation
----------------
Rules for validation - only 2.

(1) Each Begin tag must preceed its End tag, i.e.:
      aintStartPos(x,TAG_BEGIN) < aintStartPos(x,TAG_END)

(2) Each Begin tag after the first must follow the End tag paired with the previous Begin tag.  Stated a bit differently, the start value for a subsequent Begin tag cannot fall before the End tag paired with the previous Begin tag as this would make two Begin tags in a row.  Stated yet a third way, a Begin tag cannot fall between a Begin tag and an End tag., i.e.:
     aintStartPos(x,TAG_BEGIN) > aintStartPos(x-1,TAG_END)

Three short loops are necessary for the validation processing - one to populate the array (keeping track of how many items are added so that our next two loops only iterate the proper number of elements of our fixed-size array), one to test rule one, and one to test rule two.

(B) - repair
-------------

For failures of rule one (which are likely impossible if the tags are inserted programmatically as I understand from above), we would have to encounter "BEEBBE" or worse(B=begin tag, E=end tag).  The failure would be:
       aintStartPos(1,TAG_BEGIN) > aintStartPos(1,TAG_END)
and the repair would be to remove the offending End tag that starts at aintStartPos(1,TAG_BEGIN) and the subsequent Begin tag at aintStartPs(1,TAG_END) - quite happily the two values used in the comparison.

For failures of rule two (the only ones I believe are actually possible), we would have to encounter "BBEEBE" or "BEBBEE" or something longer with more tags involved.  The failure would be (in the first case):
      aintStartPos(1,TAG_BEGIN) < aintStartPos(0,TAG_END)
(or in the second case):
      aintStartPos(2,TAG_BEGIN) < aintStartPos(1,TAG_END)
and the repair is to remove the offending Begin tag and the immediately subsequent End tag, which (happily again) are the two values used in the comparison.

We note here that the routine should probably be called in a loop until validation is confirmed.  A string with "BBBEEE" would require two passes (one for each insertion error), for example.  After the first pass, the second Begin tag and first End tags would have been removed and after the second pass, the third Begin tag and the second End tag would be removed - leaving just the first Begin tag and the last End tag.

Depending on how we structure the steps in our code, we may need to intitialize our array between validation steps, of course.

OK on the design?  

Want the code, too?
0
 
rdrunnerCommented:
This sounds like regular expressions to me ...

Include the regexp object in your code and try to replace it this way...

Let me try this matchstring .. i think it should work

oRegExp.pattern = "(\[(.*)\])(.*)(\[\1\])*(.*)(\[\\\1\])\[\\\1\]"

brb
0
 
rdrunnerCommented:
Ok like allways i messed my pattern up ;)

Here is a working version:

Info: test.txt contained your 1st post as example data. It will check only for 1 lvl nested tags .. so to make sure you dont have 3 same tags you need to check if there is still a match in the line...

Remove the last * if you are looking only for closed tags ...

Hope this helps
'Code
Private Sub Command1_Click()
Dim oRegExp As New RegExp
Dim cLine As String
Dim oMatches As MatchCollection
Dim oMatch As Match
With oRegExp
    .Global = True
    .MultiLine = True
    .IgnoreCase = True
End With
Set oStrmInput = oFso.OpenTextFile("c:\temp\test.txt")
cLine = oStrmInput.ReadAll
oRegExp.Pattern = "(\[(.*?)\])(.*?)(\[\2\])(.*?)(\[/\2\])(.*?)(\[/\2\])" '((.*)(\[\2\])*(.)*(\[\\\2\])*)\([\\\2\])*"
Set oMatches = oRegExp.Execute(cLine)
cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")
Debug.Print cLine

For Each oMatch In oMatches
    Dim i As Integer
    For i = 0 To oMatch.SubMatches.Count - 1
      Debug.Print oMatch.SubMatches(i)
    Next
    Debug.Print oMatch

Next

Debug.Print cLine

End Sub

'end code

'--------------DEBUG OUTPUT --------------
I've been having problems with a function that's based around Replace.

Given a String, with multiple occurances of a sub-string beginning and sub-string ending, I need to replace all parts of the substring that are within a sub-string beginning and ending.

Sounds rough. It's not that bad, here's an example:

newStr = "[SubString]Hello this is a [/SubString] string"

That's fine.

newStr2 = "[SubString]Hello is[/SubString] is another [SubString] string [/SubString]"

That's also fine.

But this one is bad, and would require replacement:

newStr3 = "[SubString]This is a  bad String.[/SubString]"

If should be modified to look like this:

newStr3 = "[SubString]This is a bad String.[SubString]"

Basically, any time you're within a SubString, there can be no other SubString parts. You can assume that every SubString beginning has a matching SubString Ending.

Anyone wanna take a stab at it? I'd give more points, but 40 is all I have, and this is a not-for-profit organization that I'm already losing money on, and cannot afford more (sorry!).

0
 
mlmccCommented:
WHat I gave you will produce a valid string.

To have it also do the replacement

End Sub

Public Function subValidateStr(strText As String, strStart As String, strStop As String, _
                                                strRepStart As String, strRepStop As String) As String

Dim intStart As Integer
Dim intStop As Integer
Dim intStartPrev As Integer
Dim intStopPrev As Integer
Dim strTextTemp As String
Dim boolError As Boolean

    strTextTemp = strText
    Do
        intStart = 0
        intStop = 0
        boolError = False
        Do
            intStartPrev = intStart
            intStopPrev = intStop
           
            intStart = InStr(intStart + 1, strTextTemp, strStart)
            intStop = InStr(intStop + 1, strTextTemp, strStop)
           
            If (intStop < intStart) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart))
                boolError = True
            ElseIf (intStart = 0) And (intStop > 0) Then
                strTextTemp = Left(strTextTemp, intStop - 1) + _
                                Mid(strTextTemp, intStop + Len(strStop))
                boolError = True
            ElseIf (intStart < intStopPrev) And (intStart <> 0) Then
                strTextTemp = Left(strTextTemp, intStart - 1) + _
                                Mid(strTextTemp, intStart + Len(strStart), intStopPrev - intStart - Len(strStart)) + _
                                Mid(strTextTemp, intStopPrev + Len(strStop))
                boolError = True
            End If
        Loop Until boolError Or ((intStop = 0) And (instart = 0))
    Loop Until intStop = 0 And intStart = 0

    intStart = InStr(1, strTextTemp, strStart)
    intStop = InStr(1, strTextTemp, strStop)
    Do While intStart > 0
        strTextTemp = Left(strTextTemp, intStart - 1) + strRepStart + _
                        Mid(strTextTemp, intStart + Len(strStart), intStop - intStart - Len(strStart)) + _
                        strRepStop + Mid(strTextTemp, intStop + Len(strStop))
        intStart = InStr(1, strTextTemp, strStart)
        intStop = InStr(1, strTextTemp, strStop)
    Loop
    subValidateStr = strTextTemp
   
End Function


Call is now

(Me.Text1.Text, "B", "b", "123", "789")


mlmcc
0
 
mlmccCommented:
Sorry about that.  Call should be like

    YourStr = subValidateStr(YourStr, "B", "b", "123", "789")

mlmcc
0
 
MistwolfAuthor Commented:
mlmcc - I don't understand the call to the function you said would replace - why do I need to pass it 5 args? It should only need 3 args - string, tagBegin, tagEnd...

navneet77 - preferrably it would take the tag it will be replacing as a argument to the function.

rdrunner - I thought about using regEx for a while myself, but I just don't know enough about them. Your code looks very complex, but you've shown it works for 1 level nested tags. I don't think there will be 2999 level nested tags, but there is a chance there will be more than one. If you could set it up as a loop, with a string input (instead of file) it would probably suit my needs!

QJohnson - you have the most interesting approach, and I'd really love to see your code! I don't think there will be more than 10 nested tags, so an array of size 10 would be fine.

Thanks guys for all your help, hopefully we can get something working! *crosses fingers*
0
 
rdrunnerCommented:
My code will work for any tag there is... Thats the beauty of regular expressions ;)

You can modify my code quite easy to clean all tags away...

the only working part in my code is this snippet ...

oRegExp.Pattern = "(\[(.*?)\])(.*?)(\[\2\])(.*?)(\[/\2\])(.*?)(\[/\2\])"
cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")

The pattern is the rule you gave me ....

Let me break it up into english or at least try

(\[(.*?)\]) -> find a [tag] or a [sam] or a [sample] or [dsfjkahkdajs]
The inner () tells it to remember what it just found...

(.*?) -> find any text  . = wildcard ; * = 0 or more times: *? = zero or more but only as many as needed

(\[\2\]) -> find the 1st tag again ... \2 is what we remembered above...

so if we found [sam] it will look for another [sam]

(.*?) Some text again ....

(\[/\2\]) find [/SAM] or whatever we had in the 2nd ()

(.*?) Some text again ....

(\[/\2\]) find [/SAM] or whatever we had in the 2nd ()


Now the replace ..

cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")

Put everything together and leave out 2nd () , 4th () , 6th () matches (this leaves only the text)

Hope this helps ;)
0
 
rdrunnerCommented:
Ps to loop it try this snipped


while oregexp.test (cline)
    cLine = oRegExp.Replace(cLine, "$1$3$5$7$8")
wend

0
 
mlmccCommented:

'
'  strText - String to fix
'  strStart - Starting string
'  strStop - Terminating string
'  strRepStart - Replacement string for the start string
'  strRepStop - Replacement string for the terminating string
'

Public Function subValidateStr(strText As String, strStart As String, strStop As String, _
                                               strRepStart As String, strRepStop As String) As String
This will first produce a valid string then replace the beginning and terminating strings with the appropriate replacement strings.  I thought that was the gist of what you wanted.


If all you want to do is create a valid string use the first answer that has only 3 parameters.  It will produce a valid string from an invalid one.

Public Function subValidateStr(strText As String, strStart As String, strStop As String) As String

mlmcc
0
 
MistwolfAuthor Commented:
mlmcc, The first one (3 inputs) works great =) I ran a test, and this was the output:

OK = [quote]hey what's[/quote] up? [quote] this string [/quote] is good.
Bad = [quote]hey what's[quote] up? [/quote] this string [/quote] is bad.
Very Bad = [quote]hey what's[quote] up[quote]?[/quote] [/quote] this string [/quote] is very bad.[/quote]
Fubar = [/quote]hey [quote][/quote] what's[quote][/quote] up? [/quote][quote][quote][quote][/quote] this string [/quote] [/quote] is fubar.[quote]

OK = [quote]hey what's[/quote] up? [quote] this string [/quote] is good.
Bad = [quote]hey what's up? this string [/quote] is bad.
Very Bad = [quote]hey what's up? this string [/quote] is very bad.
Fubar = hey what's up? this string is fubar.

Thanks.

rdrunner - I'm going to use mlmcc's answer because I can understand it (hehe), but I'm going to make a "points for rdrunner" question for you, because you deserve them =)
0
 
QJohnsonCommented:
Sorry I had to go to bed last evening and didn't see the request for code until almost three hours after you posted it.  I can understand your reluctance to wait any longer (particularly when you have some code that works for you!)

Good luck - hope you get SOME TIME off this weekend. <g>

Q
0
 
rdrunnerCommented:
Try to understand regular expressions if you have to mess with text some more ;)

they are realllly great!

0
 
rdrunnerCommented:
P.s: I just threw the fubar string into my (modified with loop) function to test what it would toss out...

here is the result...

[/quote]hey [quote][/quote] what's up? [/quote][quote] this string  [/quote] is fubar.[quote]

I would say it fits the requirements ;) No longer any nested tags...
0
 
mlmccCommented:
Glad I could help

mlmcc
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 9
  • 7
  • 6
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now