Solved

RegEx.Replace c#

Posted on 2006-11-16
7
905 Views
Last Modified: 2010-05-18
hi guys,
im trying to parse some html without success...
what i got is a string that holds an html.

i need to remove DIVS from it, but only if the div contains an image
with a specific src.

for example: this entire div should be replaced with an empty string...
<div><img src="bad.gif"></div>

and this should stay as is.
<div><img src="good.gif"></div>


what i was trying to do is using the following pattern :
<div.*bad.gif.*</div>

im new with regex so be gentle...(:
thx!
0
Comment
Question by:tsabbay
  • 3
  • 2
  • 2
7 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 17964031
<div((?!</div>).)*bad.gif.*?</div>
0
 
LVL 18

Expert Comment

by:Ravi Singh
ID: 17964119
Hi, this method should strip out the div tags based on the src string you provide:

private string RemoveDivTagsBySrc(string html, string src)
{
      return Regex.Replace(html, "<div[^>]*>(.*?)<img(.*?)src=\"" + src + "\"[^>]*>(.*?)</div>", string.Empty, RegexOptions.IgnoreCase);
}

Usage:

string sampleHtml = "<div><img src=\"bad.gif\"></div>" + "\n" + "<div><img src=\"good.gif\"></div>" + "\n" + "<div><img src=\"bad.gif\"></div>";

string newHtml = this.RemoveDivTagsBySrc(sampleHtml, "bad.gif");

//newHtml string should now only contain the div tag with "good.gif" as the src
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17964138
That would remove the entirety of
"<div><img src=\"good.gif\"></div>  <div><img src=\"bad.gif\"></div>"
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 

Author Comment

by:tsabbay
ID: 17964201
hi guys,
thank you all for your reply...while waiting i wrote a small string search and replace...
i checked both solutions and both not doing it right, sorry.

benny.
0
 
LVL 84

Expert Comment

by:ozo
ID: 17964255
what are they not doing right?  If the div can include newlines, you should use RegexOptions.Singleline
bad.gif should really be bad\\.gif or bad[.]gif
0
 
LVL 18

Expert Comment

by:Ravi Singh
ID: 17964273
ozo's right, my regex gets greedy and matches all the div tags, one way of using his regex is shown below:

(PLEASE ACCEPT THE SOLUTION BY OZO IF THIS WORKS FOR YOU)



private string RemoveDivTagsBySrc(string html, string src)
{
      return Regex.Replace(html, "<div((?!</div>).)*" + src + ".*?</div>", string.Empty, RegexOptions.IgnoreCase);
}

use:

string sampleHtml = "<div><img src=\"good.gif\"></div><div><img src=\"bad.gif\"></div>";
string newHtml = this.RemoveDivTagsBySrc(sampleHtml, "bad.gif");
0
 

Author Comment

by:tsabbay
ID: 17965251
i dnt know why...but its not just removing the relevant texts..its also removes some other divs closing tags from the html.

beside, my custom codes seems to work much faster then the regex so im dropping the usage.

thank you all for your time!
a credit will be givven to OZO.
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
splitOdd10 challenge 5 104
topping3 challenge 14 78
word0 challenge 3 89
Auto-indent certain lines in Notepad++ 10 72
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now