Solved

RegEx.Replace c#

Posted on 2006-11-16
7
902 Views
Last Modified: 2010-05-18
hi guys,
im trying to parse some html without success...
what i got is a string that holds an html.

i need to remove DIVS from it, but only if the div contains an image
with a specific src.

for example: this entire div should be replaced with an empty string...
<div><img src="bad.gif"></div>

and this should stay as is.
<div><img src="good.gif"></div>


what i was trying to do is using the following pattern :
<div.*bad.gif.*</div>

im new with regex so be gentle...(:
thx!
0
Comment
Question by:tsabbay
  • 3
  • 2
  • 2
7 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 17964031
<div((?!</div>).)*bad.gif.*?</div>
0
 
LVL 18

Expert Comment

by:Ravi Singh
ID: 17964119
Hi, this method should strip out the div tags based on the src string you provide:

private string RemoveDivTagsBySrc(string html, string src)
{
      return Regex.Replace(html, "<div[^>]*>(.*?)<img(.*?)src=\"" + src + "\"[^>]*>(.*?)</div>", string.Empty, RegexOptions.IgnoreCase);
}

Usage:

string sampleHtml = "<div><img src=\"bad.gif\"></div>" + "\n" + "<div><img src=\"good.gif\"></div>" + "\n" + "<div><img src=\"bad.gif\"></div>";

string newHtml = this.RemoveDivTagsBySrc(sampleHtml, "bad.gif");

//newHtml string should now only contain the div tag with "good.gif" as the src
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17964138
That would remove the entirety of
"<div><img src=\"good.gif\"></div>  <div><img src=\"bad.gif\"></div>"
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:tsabbay
ID: 17964201
hi guys,
thank you all for your reply...while waiting i wrote a small string search and replace...
i checked both solutions and both not doing it right, sorry.

benny.
0
 
LVL 84

Expert Comment

by:ozo
ID: 17964255
what are they not doing right?  If the div can include newlines, you should use RegexOptions.Singleline
bad.gif should really be bad\\.gif or bad[.]gif
0
 
LVL 18

Expert Comment

by:Ravi Singh
ID: 17964273
ozo's right, my regex gets greedy and matches all the div tags, one way of using his regex is shown below:

(PLEASE ACCEPT THE SOLUTION BY OZO IF THIS WORKS FOR YOU)



private string RemoveDivTagsBySrc(string html, string src)
{
      return Regex.Replace(html, "<div((?!</div>).)*" + src + ".*?</div>", string.Empty, RegexOptions.IgnoreCase);
}

use:

string sampleHtml = "<div><img src=\"good.gif\"></div><div><img src=\"bad.gif\"></div>";
string newHtml = this.RemoveDivTagsBySrc(sampleHtml, "bad.gif");
0
 

Author Comment

by:tsabbay
ID: 17965251
i dnt know why...but its not just removing the relevant texts..its also removes some other divs closing tags from the html.

beside, my custom codes seems to work much faster then the regex so im dropping the usage.

thank you all for your time!
a credit will be givven to OZO.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
copyEndy  challenge 15 58
mergeTwo  challenge 13 72
for loop with Set 4 46
how to send memory stream from ics Client To ics server ? 11 50
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
This is about my first experience with programming Arduino.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now