Solved

Regular Expression

Posted on 2004-10-11
19
481 Views
Last Modified: 2010-07-27
I need a regular expression that will match a specific word in a string of words but not a word with the characters = / \ . either before or after it

Example string
This is a link to the <a href="www.abc.com">abc</a> website.


Word to match
abc

I want the second abc to be matched but not the abc in www.abc.com



0
Comment
Question by:CUTTHEMUSIC
  • 8
  • 5
  • 2
  • +1
19 Comments
 
LVL 84

Expert Comment

by:ozo
Comment Utility
qr((?<![=/\\.])(abc)(?![=/\\.]))
0
 
LVL 6

Expert Comment

by:etmendz
Comment Utility
You'll normally extract a string bounded by delimiters by first isolating or removing the delimiters from the string. A simple trick is to create a pattern to match the delimiters. When you parse the string, match the opening delimiter and skip it. Read the content that follows until the closing delimiter is matched.

To match an HTML, XML or SGML opening tag (and similar mark-up languages), the following works:

/<[^>]+>/

You use this to signal that an opening tag is matched. Parse through the string and extract the content until the closing tag is matched:

/<\/[^>]+>/

Have fun.
0
 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
<([\w][\w\d]*)[^>]*>(.*?)<\/\1> this will get you the correct text considering the html tags, attributes etc

so your
<a href="www.abc.com">abc</a> will give you abc
and so will <a>abc</a>

Pratap
0
 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
and if it is just the <A> tag you are looking for then this is a simpler one..
<A[^>]*>(.*?)</A>

make sure you turn off case sensitivity

Pratap
0
 
LVL 2

Author Comment

by:CUTTHEMUSIC
Comment Utility
Ok let me explain more. I am trying to create an appliacation that searches through text and hightligts a certain word. Here is what I am using

Private Function findAndHighlight(ByVal Search_Str As String, ByVal InputTxt As String, ByVal StartTag As String, ByVal EndTag As String) As String

        Return Regex.Replace(InputTxt, "\b(" & Regex.Escape(Search_Str) & ")\b", StartTag & "$1" & EndTag, RegexOptions.IgnoreCase)

End Function

I would call the function like this
findAndHighlight("abc"), "This is a link to the <a href=www.abc.com>abc</a> website.", "<B>", "</B>")

The output that my current code produces is
<a href=www.<B>abc</B>.com><B>abc</B></a> website.
This would obviously cause problems when the user clicked the link.

I also have links that look like this
<a href=www.xyz.com?id=abc>abc</a>

What current code returns this
<a href=www.xyz.com?id=<B>abc</B>><B>abc</B></a>

What should be returned is
<a href=www.xyz.com?id=abc><B>abc</B></a>



0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
(?<![=\/\\.])(abc)(?![=\/\\.])
fulfills your original specification,
but it now sounds like you want to ignore strings in <tags>
that can get tricky with things like:
<IMG SRC = "foo.gif"
         ALT = "A > B">

<!-- <A comment> -->

<script>if (a<b && a>c)</script>

<# Just data #>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>


and what would you want to do with
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.
0
 
LVL 2

Author Comment

by:CUTTHEMUSIC
Comment Utility
ozo,
I'm not sure how to implement your original solution into my code.
I am increasing the points because of all of the revisions that I have made.

If I was searching for abc and I had the following string
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.

I would want to get this
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.

but if I were searching for xxxabcyyy then I would want this

<a href="www.xxxabcyyy.com"><B>xxxabcyyy</B></a> website.

Again, what I am trying to do is search data that comes in the form of a string. The string is from a database that I did a full text search on. I need a function that will take the string and highlight the searched words that produced the output from the full text search. But in the string there may be links, I don't want the code to highlight the searched words if it exists in the link. Hope this helps.
0
 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
CUTTHEMUSIC, the regex i mentioned in my previous post would work for you.

heres the code in c#
      string MyFunc(Match m)
      {
            return m.Groups[1].ToString() + "<b>" + m.Groups[3].ToString() + "</b></" + m.Groups[2].ToString() + ">";
      }
      private void button2_Click(object sender, System.EventArgs e)
      {
            Regex r=new Regex(@"(<([\w][\w\d]*)[^>]*>)?(.*?)</\2>");
            MessageBox.Show(r.Replace("<a href=www.xyz.com?id=abc>abc</a>",new MatchEvaluator(MyFunc)));
      }


Enjoy!
Pratap
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
you may change the MyFunc in my code to do the proper formatting as required..

The messagebox for the above code displays this

<a href=www.xyz.com?id=abc><b>abc</b></a>


Pratap
0
 
LVL 6

Expert Comment

by:etmendz
Comment Utility
You have tags that you need to ignore in order to extract the content. This is usually not easy so it is not a one line solution. The best way is to be able to isolate the tags and then grab only the text inside the tag. You can perform a recursive loop if needed to isolate even the tags within tags within tags and extract only the content you want. You can do this manually or you can use (in C#):

//Create the XmlDocument.
XmlDocument doc = new XmlDocument();
//Create a document fragment.
XmlDocumentFragment docFrag = doc.CreateDocumentFragment();
//Set the contents of the document fragment.
docFrag.InnerXml ="<a href='www.abc.com'>abc</a>";
//Display the document fragment.
Console.WriteLine(docFrag.InnerXml);
Console.WriteLine(docFrag.InnerText); // <-- THIS IS THE TRICK ;-)

Have fun...
0
 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
using XMLDocument for just extracting text might be a performance overhead, since this involves creation of the DOM object, validation etc. regx on the other hand is a one liner solution.. one pattern matches all your requirements.

a single regx replace will replace all occurances no loops required.. so an input of
<a href=www.xyz.com?id=abc>abc</a><a href=www.xyz.com?id=def>def</a>

for my function will provide
<a href=www.xyz.com?id=abc><b>abc</b></a><a href=www.xyz.com?id=def><b>def</b></a>

you just have to write the pattern properly

Pratap
0
 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
ozo's solution centered around the text being hardcoded.. (i.e, abc being static)

my post solves the problem, both my first one and the 3rd from the last one.

Have Fun!
Pratap
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
The original question specifies a static abc

Regex(@"(<([\w][\w\d]*)[^>]*>)?(.*?)</\2>");
will match any pair of matching tags, and would change
"<body> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy <body>"
into
"<body><b> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy </b></body>"
0
 
LVL 11

Expert Comment

by:pratap_r
Comment Utility
it specifies the static abc as an example, not as part of requirement.

you are right about the regex you have mentioned, thats why i had answered it with
<([\w][\w\d]*)[^>]*>(.*?)<\/\1> in my post above (3rd from the top).
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
CUTTHEMUSIC also clarified later that when searching for abc,
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.
should remain unchanged.

<([\w][\w\d]*)[^>]*>(.*?)<\/\1>
would also change
<body> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy <body>
into
<body><b> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy </b></body>
0
 
LVL 84

Accepted Solution

by:
ozo earned 150 total points
Comment Utility
neither of us solved the full problem as revised in http:#12286541
but that version is problematic to solve with a regular expression alone.
0
 
LVL 11

Assisted Solution

by:pratap_r
pratap_r earned 150 total points
Comment Utility
yeah i guess the requirement got skewed in #12286541
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
This article will show, step by step, how to integrate R code into a R Sweave document
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now