?
Solved

Need to identify and replace content between { }'s

Posted on 2008-02-07
5
Medium Priority
?
238 Views
Last Modified: 2012-08-14
Hello.

I am attempting to match the following:

{ some_text -> a_url }

where 'some_text' is a string of characters of any length and 'a_url' is obviously a URL.
Within a body of text, this pattern may be found multiple times.

I came up with the regex:

\{(.+)->(.+)\}

Here was my thinking:

\{  - match the opening curly bracket
(.+)  - capture  text between the curly bracket and the arrow, ->
->  - match the arrow
(.+)  - capture  the text between the arrow and the closing curly bracket
\}  - match the closing curly bracket.

Ultimately, I am trying to convert the contents of the matched by the pattern into a typical <a> tag.  That's why I'm storing 'some_text' and 'a_url'.  The 'some_text' becomes enclosed by the <a> tag and the 'a_url' becomes assigned to href attribute.

I thought I had it figured out.  When I tested the regex on  text containing one instance of the pattern, things went smoothly like I had anticipated.  However, when multiple instances occur the regex matches a huge chunk of text encompassing the first opening curley bracket ( { ) found and the last closing curly bracket ( } ) found.

I understand that I need to make my expression more precise, but I am at a loss as to how I can accomplish this.

How do I more precisely define the regex so that I can indentify multiple instances of the { some_text -> a_url } pattern?

Many thanks,
Eric
0
Comment
Question by:ewolsing
  • 3
  • 2
5 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 20841712
Please provide an example of your real source text, with multiple instances (if exist) etc.
0
 

Author Comment

by:ewolsing
ID: 20841811
I have attached a .txt file containing an example text with two instances of the pattern I described earlier.
sample.txt
0
 
LVL 27

Accepted Solution

by:
ddrudik earned 2000 total points
ID: 20841872
Thanks, consider this example:
http://www.myregextester.com/?r=35

Raw Match Pattern:
"{(.*?)->(.*?)}"

Raw Replace Pattern:
<a href="$2">$1</a>
0
 

Author Comment

by:ewolsing
ID: 20842086
Thanks!  That did the trick.  I really appreciate it.

For my own edification, let me parse your reasoning on the '(.*?)' segments.

.  - match any character...
*  - ...as many times as necessary...
?  - ...only once before the arrow (->).

Do I have that correct?

Thanks again.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20842186
Yes, from that site link there's an "explain" checkbox that returns this upon submit:

Match Pattern Explanation:
The regular expression:

(?-imsx:"{(.*?)->(.*?)}")

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  "{                       '"{'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ->                       '->'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  }"                       '}"'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

593 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question