Regular Expression

I need a regular expression that will match a specific word in a string of words but not a word with the characters = / \ . either before or after it

Example string
This is a link to the <a href="www.abc.com">abc</a> website.


Word to match
abc

I want the second abc to be matched but not the abc in www.abc.com



LVL 2
CUTTHEMUSICAsked:
Who is Participating?
 
ozoCommented:
neither of us solved the full problem as revised in http:#12286541
but that version is problematic to solve with a regular expression alone.
0
 
ozoCommented:
qr((?<![=/\\.])(abc)(?![=/\\.]))
0
 
etmendzCommented:
You'll normally extract a string bounded by delimiters by first isolating or removing the delimiters from the string. A simple trick is to create a pattern to match the delimiters. When you parse the string, match the opening delimiter and skip it. Read the content that follows until the closing delimiter is matched.

To match an HTML, XML or SGML opening tag (and similar mark-up languages), the following works:

/<[^>]+>/

You use this to signal that an opening tag is matched. Parse through the string and extract the content until the closing tag is matched:

/<\/[^>]+>/

Have fun.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
pratap_rCommented:
<([\w][\w\d]*)[^>]*>(.*?)<\/\1> this will get you the correct text considering the html tags, attributes etc

so your
<a href="www.abc.com">abc</a> will give you abc
and so will <a>abc</a>

Pratap
0
 
pratap_rCommented:
and if it is just the <A> tag you are looking for then this is a simpler one..
<A[^>]*>(.*?)</A>

make sure you turn off case sensitivity

Pratap
0
 
CUTTHEMUSICAuthor Commented:
Ok let me explain more. I am trying to create an appliacation that searches through text and hightligts a certain word. Here is what I am using

Private Function findAndHighlight(ByVal Search_Str As String, ByVal InputTxt As String, ByVal StartTag As String, ByVal EndTag As String) As String

        Return Regex.Replace(InputTxt, "\b(" & Regex.Escape(Search_Str) & ")\b", StartTag & "$1" & EndTag, RegexOptions.IgnoreCase)

End Function

I would call the function like this
findAndHighlight("abc"), "This is a link to the <a href=www.abc.com>abc</a> website.", "<B>", "</B>")

The output that my current code produces is
<a href=www.<B>abc</B>.com><B>abc</B></a> website.
This would obviously cause problems when the user clicked the link.

I also have links that look like this
<a href=www.xyz.com?id=abc>abc</a>

What current code returns this
<a href=www.xyz.com?id=<B>abc</B>><B>abc</B></a>

What should be returned is
<a href=www.xyz.com?id=abc><B>abc</B></a>



0
 
ozoCommented:
(?<![=\/\\.])(abc)(?![=\/\\.])
fulfills your original specification,
but it now sounds like you want to ignore strings in <tags>
that can get tricky with things like:
<IMG SRC = "foo.gif"
         ALT = "A > B">

<!-- <A comment> -->

<script>if (a<b && a>c)</script>

<# Just data #>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>


and what would you want to do with
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.
0
 
CUTTHEMUSICAuthor Commented:
ozo,
I'm not sure how to implement your original solution into my code.
I am increasing the points because of all of the revisions that I have made.

If I was searching for abc and I had the following string
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.

I would want to get this
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.

but if I were searching for xxxabcyyy then I would want this

<a href="www.xxxabcyyy.com"><B>xxxabcyyy</B></a> website.

Again, what I am trying to do is search data that comes in the form of a string. The string is from a database that I did a full text search on. I need a function that will take the string and highlight the searched words that produced the output from the full text search. But in the string there may be links, I don't want the code to highlight the searched words if it exists in the link. Hope this helps.
0
 
pratap_rCommented:
CUTTHEMUSIC, the regex i mentioned in my previous post would work for you.

heres the code in c#
      string MyFunc(Match m)
      {
            return m.Groups[1].ToString() + "<b>" + m.Groups[3].ToString() + "</b></" + m.Groups[2].ToString() + ">";
      }
      private void button2_Click(object sender, System.EventArgs e)
      {
            Regex r=new Regex(@"(<([\w][\w\d]*)[^>]*>)?(.*?)</\2>");
            MessageBox.Show(r.Replace("<a href=www.xyz.com?id=abc>abc</a>",new MatchEvaluator(MyFunc)));
      }


Enjoy!
Pratap
0
 
pratap_rCommented:
you may change the MyFunc in my code to do the proper formatting as required..

The messagebox for the above code displays this

<a href=www.xyz.com?id=abc><b>abc</b></a>


Pratap
0
 
etmendzCommented:
You have tags that you need to ignore in order to extract the content. This is usually not easy so it is not a one line solution. The best way is to be able to isolate the tags and then grab only the text inside the tag. You can perform a recursive loop if needed to isolate even the tags within tags within tags and extract only the content you want. You can do this manually or you can use (in C#):

//Create the XmlDocument.
XmlDocument doc = new XmlDocument();
//Create a document fragment.
XmlDocumentFragment docFrag = doc.CreateDocumentFragment();
//Set the contents of the document fragment.
docFrag.InnerXml ="<a href='www.abc.com'>abc</a>";
//Display the document fragment.
Console.WriteLine(docFrag.InnerXml);
Console.WriteLine(docFrag.InnerText); // <-- THIS IS THE TRICK ;-)

Have fun...
0
 
pratap_rCommented:
using XMLDocument for just extracting text might be a performance overhead, since this involves creation of the DOM object, validation etc. regx on the other hand is a one liner solution.. one pattern matches all your requirements.

a single regx replace will replace all occurances no loops required.. so an input of
<a href=www.xyz.com?id=abc>abc</a><a href=www.xyz.com?id=def>def</a>

for my function will provide
<a href=www.xyz.com?id=abc><b>abc</b></a><a href=www.xyz.com?id=def><b>def</b></a>

you just have to write the pattern properly

Pratap
0
 
pratap_rCommented:
ozo's solution centered around the text being hardcoded.. (i.e, abc being static)

my post solves the problem, both my first one and the 3rd from the last one.

Have Fun!
Pratap
0
 
ozoCommented:
The original question specifies a static abc

Regex(@"(<([\w][\w\d]*)[^>]*>)?(.*?)</\2>");
will match any pair of matching tags, and would change
"<body> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy <body>"
into
"<body><b> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy </b></body>"
0
 
pratap_rCommented:
it specifies the static abc as an example, not as part of requirement.

you are right about the regex you have mentioned, thats why i had answered it with
<([\w][\w\d]*)[^>]*>(.*?)<\/\1> in my post above (3rd from the top).
0
 
ozoCommented:
CUTTHEMUSIC also clarified later that when searching for abc,
<a href="www.xxxabcyyy.com">xxxabcyyy</a> website.
should remain unchanged.

<([\w][\w\d]*)[^>]*>(.*?)<\/\1>
would also change
<body> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy <body>
into
<body><b> xxx <a href="www.xxxabcyyy.com">xxxabcyyy</a> website. <a href="www.xxxabcyyy.com">xxxabcyyy</a> yyy </b></body>
0
 
pratap_rCommented:
yeah i guess the requirement got skewed in #12286541
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.