Parsing a text string (Regular expressions)

Hi all,

I am looking for a simple solution to my problem.

I have a text string which contains several pieces of information enclosed in '[]' tags. I am looking for an example piece of code which will parse through the text and replace these tags with other information.

For example:

String = "Hello [foo] world. How are you [bar] today?"

output = "Hello Chris world. How are you Hughes today?"

Thanks..
chrishughesAsked:
Who is Participating?
 
WMIFCommented:
old - ([\[](\w| )+[\]])
new - \[([\w ]+)\]

first off, you wanted to have word characters or spaces im assuming from your pattern.  put the \w and " " inside the square braces.  next thing there, you had the grouping applied to the single character there.  add the square braces inside your round braces and you create your group.  only problem then was that you had the "1 or more" + sign outside the grouping.  your $1 was returning the entire thing.  $2 was closer to what you wanted, only $2 was returning a single character with the + outside the grouping.  once thats straightened out, you can remove the addition square braces surrounding the escaped square braces because they are not needed and just add confusion.  do you follow what was going on?
0
 
chrishughesAuthor Commented:
But I should clarify - I can't use a direct replace statement because the text between the []'s determines the replacement text.
0
 
John_LennonCommented:
how can you know what word put instead of another?

you can try something like

'####################
txtOriginal = "Hello [foo] world. How are you [bar] today?"
do while instr(txtOriginal, "[") > 0
  intFirts = instr(txtOriginal, "[")
  intFinal = instr(txtOriginal, "]")
  oldTxt = mid(txtOriginal, intFirst + 1, intFinal - 1)
  newTxt = replaceText(oldTxt)
  output = replace(txtOriginal, "[" & oldTxt & "]", newTxt)
loop
'output = "Hello Chris world. How are you Hughes today?"

function replaceText(Word)
  dim strTmp
  select case word
    case "foo"
      strTmp = "Chris"
    case "bar"
      strTmp = "Hughes"
  end select
end function
'####################

but you have to put all the words in the case in replaceText function,

how do you know the replacement text? you got that in a DB? or you use a case sentence like mine?
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
John_LennonCommented:
sorry, you have to include this lines of code in the case statement to prevent infinite looping

case else
  strTmp = ""
0
 
WMIFCommented:
let me first say that you dont need regular expressions for this.  if you have a list of values to replace, then just run all those values on the string.  if the replace function finds the value you tell it to look for, it will replace the value.  if it doesnt find the value, it simply bails without changing anything.

thestring = "Hello [foo] world. How are you [bar] today?"

thestring = replace(thestring, "[foo]", "Chris")
' gives = "Hello Chris world. How are you [bar] today?"

thestring = replace(thestring, "[not_there]", "Fred")
' gives = "Hello Chris world. How are you [bar] today?"

thestring = replace(thestring, "[bar]", "Hughes")
' gives = "Hello Chris world. How are you Hughes today?"




if you have a db list of values to replace, then just get those values and loop.

query = "select keyword, replacewith from terms"
do until rs.eof
  thestring = replace(thestring, rs("keyword"), rs("replacewith"))
  rs.movenext
loop



let me know if this cant meet your needs.
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
Wouldn't it be easier to just create a method where you pass these words as a parameter?

Public Sub CreateText(FirstName,SecondName)
  Response.Write "Hello " & FirstName & " world. How are you " & SecondName & " today?"
End Sub

Cheers,

Leo
0
 
chrishughesAuthor Commented:
Hi guys,

Ok I may have over simplified what I was after  and thus not been clear. I will give you the full problem and I beleive that I do need to us regular expressions:

I have a text string such as : "Hello world [how] are [you] today?" which is going to form text on a webpage and can contain any number of words or phrases wrapped in [ ]'s.

I want to parse the string and turn these words into links. For example this example would become: "Hello world <a href="home.asp?p=how">how</a> are <a href="home.asp?p=you">you</a> today?"

So I cannot use a replace statement as the words in the [ ]'s could be anything.

Currently the closest I have got is with this code:

dim regex
Set regEx = New RegExp
regEx.Global = true
regEx.IgnoreCase = True
      
regEx.Pattern = "([\[](\w| )+[\]])"
txt= regEx.Replace(txt, "<a href='?p=$1'>$1</a>")

However this still leaves the [ ]'s in in the new string. I therefore need to find some way to manipulate $1 to remove the [ and ] before creating the new string.

Any ideas?

Chris

0
 
chrishughesAuthor Commented:
In answer to leikelman: This is simply not possible as the text is being pulled out of a database and it is generic not a set pattern!

Thanks anyway!
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
once again, why do u not use a method to build the string?

As I mentioned in my first comment.  You can just pass the words you want into the function and create the pages.  Why use regular expressions?

Cheers,

Leo
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
I posted without seeing the updated comment you made.  could you explain the 'generic' text that is returned from the database?  Is there always two links?

Leo
0
 
chrishughesAuthor Commented:
The problem is the user putting the text into the database can use any tags. So regular expressions are needed to keep this simple. And there can be any number of tags.

Regular expressions seems to be the perfect solution for me, I just cannot figure out how to lose the [ and ].

Thanks for your help!

Chris
0
 
chrishughesAuthor Commented:
By tags I mean the name between the [ ]'s.
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
Ok so you want to get rid of the [] but want to keep what is inside the brackets to use for the link?

You can do it like this

<%
InitialString = "hello [how] are u"

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
.Pattern = "[[]"
.IgnoreCase = True
.Global = True
End With

'"<a href='home.asp?p=how'>" is what is returned from the database
ReplacedString = RegularExpressionObject.Replace(InitialString, "<a href='home.asp?p=how'>")


With RegularExpressionObject
.Pattern = "[]]"
.IgnoreCase = True
.Global = True
End With

ReplacedString = RegularExpressionObject.Replace(ReplacedString, "</a>")

Response.Write "Replaced " & InitialString & "<br> with " & ReplacedString

Set RegularExpressionObject = nothing
%>

run this code in an ASP page and see if this is your desired results


Leo


0
 
WMIFCommented:
you had the pattern close.  check this out:

\[([\w ]+)\]

still replacing with $1
0
 
chrishughesAuthor Commented:
WMIF - that was a perfect answer not only did it solve my problems, but I now feel like I understand regular expressions a little more!

Thank you
0
 
WMIFCommented:
they are tough, but you will catch on.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.