Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to extract <IMG> tags from HTML file?

Posted on 2003-11-23
4
Medium Priority
?
606 Views
Last Modified: 2013-11-19
Hello everyone:

anyone can tell me how to extract <IMG> tags from HTML file by using c#.net?
maybe using XML's parse function, I am not sure. please help me! thanks!

brownsbay

0
Comment
Question by:brownsbay
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 6

Accepted Solution

by:
purpleblob earned 60 total points
ID: 9809332
If the HTML is well formed (i.e. start and end tags) then you could load the HTML into an XML DOM and find all the img elements, however this is probably not the case, so a very simple alternative is to use the string class methods such as IndexOf. Are you actually wishing the extract, i.e. remove the <img> tags or simply find all of them ? If wishing to remove then obviously you will need to find the start <img> and it's end </img> and Remove (extract) the element.

If wishing to extract the <img> tags then unfortunately the string class is not very efficient with operations such as Remove, so you might wish to build an ArrayList of the start/end indices of the tags in the string then copy out the bits you want to keep into a StringBuilder - it's a shame StringBuilder has a Remove method but not Find or IndexOf - ah well we can't have it all :-)
0
 
LVL 10

Assisted Solution

by:ptmcomp
ptmcomp earned 60 total points
ID: 9812004
You can use SGML: http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=b90fddce-e60d-43f8-a5c4-c3bd760564bc

or Regex:

Matches matches = Regex.Matches(html, "<img.*?>");
foreach(Match match in matches)
{
     Console.WriteLine(Match.Value);
}
0

Featured Post

New benefit for Premium Members - Upgrade now!

Ready to get started with anonymous questions today? It's easy! Learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question