Solved

How to extract <IMG> tags from HTML file?

Posted on 2003-11-23
4
603 Views
Last Modified: 2013-11-19
Hello everyone:

anyone can tell me how to extract <IMG> tags from HTML file by using c#.net?
maybe using XML's parse function, I am not sure. please help me! thanks!

brownsbay

0
Comment
Question by:brownsbay
4 Comments
 
LVL 6

Accepted Solution

by:
purpleblob earned 20 total points
ID: 9809332
If the HTML is well formed (i.e. start and end tags) then you could load the HTML into an XML DOM and find all the img elements, however this is probably not the case, so a very simple alternative is to use the string class methods such as IndexOf. Are you actually wishing the extract, i.e. remove the <img> tags or simply find all of them ? If wishing to remove then obviously you will need to find the start <img> and it's end </img> and Remove (extract) the element.

If wishing to extract the <img> tags then unfortunately the string class is not very efficient with operations such as Remove, so you might wish to build an ArrayList of the start/end indices of the tags in the string then copy out the bits you want to keep into a StringBuilder - it's a shame StringBuilder has a Remove method but not Find or IndexOf - ah well we can't have it all :-)
0
 
LVL 10

Assisted Solution

by:ptmcomp
ptmcomp earned 20 total points
ID: 9812004
You can use SGML: http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=b90fddce-e60d-43f8-a5c4-c3bd760564bc

or Regex:

Matches matches = Regex.Matches(html, "<img.*?>");
foreach(Match match in matches)
{
     Console.WriteLine(Match.Value);
}
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question