How to extract <IMG> tags from HTML file?

Hello everyone:

anyone can tell me how to extract <IMG> tags from HTML file by using c#.net?
maybe using XML's parse function, I am not sure. please help me! thanks!

brownsbay

brownsbayAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

purpleblobCommented:
If the HTML is well formed (i.e. start and end tags) then you could load the HTML into an XML DOM and find all the img elements, however this is probably not the case, so a very simple alternative is to use the string class methods such as IndexOf. Are you actually wishing the extract, i.e. remove the <img> tags or simply find all of them ? If wishing to remove then obviously you will need to find the start <img> and it's end </img> and Remove (extract) the element.

If wishing to extract the <img> tags then unfortunately the string class is not very efficient with operations such as Remove, so you might wish to build an ArrayList of the start/end indices of the tags in the string then copy out the bits you want to keep into a StringBuilder - it's a shame StringBuilder has a Remove method but not Find or IndexOf - ah well we can't have it all :-)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ptmcompCommented:
You can use SGML: http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=b90fddce-e60d-43f8-a5c4-c3bd760564bc

or Regex:

Matches matches = Regex.Matches(html, "<img.*?>");
foreach(Match match in matches)
{
     Console.WriteLine(Match.Value);
}
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.