Parsing a text string (Regular expressions)

Hi all,

I am looking for a simple solution to my problem.

I have a text string which contains several pieces of information enclosed in '[]' tags. I am looking for an example piece of code which will parse through the text and replace these tags with other information.

For example:

String = "Hello [foo] world. How are you [bar] today?"

output = "Hello Chris world. How are you Hughes today?"

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

chrishughesAuthor Commented:
But I should clarify - I can't use a direct replace statement because the text between the []'s determines the replacement text.
how can you know what word put instead of another?

you can try something like

txtOriginal = "Hello [foo] world. How are you [bar] today?"
do while instr(txtOriginal, "[") > 0
  intFirts = instr(txtOriginal, "[")
  intFinal = instr(txtOriginal, "]")
  oldTxt = mid(txtOriginal, intFirst + 1, intFinal - 1)
  newTxt = replaceText(oldTxt)
  output = replace(txtOriginal, "[" & oldTxt & "]", newTxt)
'output = "Hello Chris world. How are you Hughes today?"

function replaceText(Word)
  dim strTmp
  select case word
    case "foo"
      strTmp = "Chris"
    case "bar"
      strTmp = "Hughes"
  end select
end function

but you have to put all the words in the case in replaceText function,

how do you know the replacement text? you got that in a DB? or you use a case sentence like mine?
sorry, you have to include this lines of code in the case statement to prevent infinite looping

case else
  strTmp = ""
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

let me first say that you dont need regular expressions for this.  if you have a list of values to replace, then just run all those values on the string.  if the replace function finds the value you tell it to look for, it will replace the value.  if it doesnt find the value, it simply bails without changing anything.

thestring = "Hello [foo] world. How are you [bar] today?"

thestring = replace(thestring, "[foo]", "Chris")
' gives = "Hello Chris world. How are you [bar] today?"

thestring = replace(thestring, "[not_there]", "Fred")
' gives = "Hello Chris world. How are you [bar] today?"

thestring = replace(thestring, "[bar]", "Hughes")
' gives = "Hello Chris world. How are you Hughes today?"

if you have a db list of values to replace, then just get those values and loop.

query = "select keyword, replacewith from terms"
do until rs.eof
  thestring = replace(thestring, rs("keyword"), rs("replacewith"))

let me know if this cant meet your needs.
Leo EikelmanDirector, IT and Business DevelopmentCommented:
Wouldn't it be easier to just create a method where you pass these words as a parameter?

Public Sub CreateText(FirstName,SecondName)
  Response.Write "Hello " & FirstName & " world. How are you " & SecondName & " today?"
End Sub


chrishughesAuthor Commented:
Hi guys,

Ok I may have over simplified what I was after  and thus not been clear. I will give you the full problem and I beleive that I do need to us regular expressions:

I have a text string such as : "Hello world [how] are [you] today?" which is going to form text on a webpage and can contain any number of words or phrases wrapped in [ ]'s.

I want to parse the string and turn these words into links. For example this example would become: "Hello world <a href="home.asp?p=how">how</a> are <a href="home.asp?p=you">you</a> today?"

So I cannot use a replace statement as the words in the [ ]'s could be anything.

Currently the closest I have got is with this code:

dim regex
Set regEx = New RegExp
regEx.Global = true
regEx.IgnoreCase = True
regEx.Pattern = "([\[](\w| )+[\]])"
txt= regEx.Replace(txt, "<a href='?p=$1'>$1</a>")

However this still leaves the [ ]'s in in the new string. I therefore need to find some way to manipulate $1 to remove the [ and ] before creating the new string.

Any ideas?


chrishughesAuthor Commented:
In answer to leikelman: This is simply not possible as the text is being pulled out of a database and it is generic not a set pattern!

Thanks anyway!
Leo EikelmanDirector, IT and Business DevelopmentCommented:
once again, why do u not use a method to build the string?

As I mentioned in my first comment.  You can just pass the words you want into the function and create the pages.  Why use regular expressions?


Leo EikelmanDirector, IT and Business DevelopmentCommented:
I posted without seeing the updated comment you made.  could you explain the 'generic' text that is returned from the database?  Is there always two links?

chrishughesAuthor Commented:
The problem is the user putting the text into the database can use any tags. So regular expressions are needed to keep this simple. And there can be any number of tags.

Regular expressions seems to be the perfect solution for me, I just cannot figure out how to lose the [ and ].

Thanks for your help!

chrishughesAuthor Commented:
By tags I mean the name between the [ ]'s.
Leo EikelmanDirector, IT and Business DevelopmentCommented:
Ok so you want to get rid of the [] but want to keep what is inside the brackets to use for the link?

You can do it like this

InitialString = "hello [how] are u"

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
.Pattern = "[[]"
.IgnoreCase = True
.Global = True
End With

'"<a href='home.asp?p=how'>" is what is returned from the database
ReplacedString = RegularExpressionObject.Replace(InitialString, "<a href='home.asp?p=how'>")

With RegularExpressionObject
.Pattern = "[]]"
.IgnoreCase = True
.Global = True
End With

ReplacedString = RegularExpressionObject.Replace(ReplacedString, "</a>")

Response.Write "Replaced " & InitialString & "<br> with " & ReplacedString

Set RegularExpressionObject = nothing

run this code in an ASP page and see if this is your desired results


you had the pattern close.  check this out:

\[([\w ]+)\]

still replacing with $1
old - ([\[](\w| )+[\]])
new - \[([\w ]+)\]

first off, you wanted to have word characters or spaces im assuming from your pattern.  put the \w and " " inside the square braces.  next thing there, you had the grouping applied to the single character there.  add the square braces inside your round braces and you create your group.  only problem then was that you had the "1 or more" + sign outside the grouping.  your $1 was returning the entire thing.  $2 was closer to what you wanted, only $2 was returning a single character with the + outside the grouping.  once thats straightened out, you can remove the addition square braces surrounding the escaped square braces because they are not needed and just add confusion.  do you follow what was going on?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
chrishughesAuthor Commented:
WMIF - that was a perfect answer not only did it solve my problems, but I now feel like I understand regular expressions a little more!

Thank you
they are tough, but you will catch on.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.