Link to home
Start Free TrialLog in
Avatar of rgarimella
rgarimella

asked on

Extracting ALT Text from Multiple HTML Pages

Hello,

I was trying to find out how I can :

Extract ALT Text from an IMG Tag, inside a identified div
There are multiple HTML files

Example, I need to ONLY extract from the following DIV named content. There are other divs and ALT text on the page.

<div id ="content"><img src="../../images/nameofimage.jpg" alt="Tool Box"></div>

Any tool or technology will work (regex or DW or Pearl)
Avatar of Duy Pham
Duy Pham
Flag of Viet Nam image

It might not be related to this topics, but you can easily do that using HtmlAgilityPack in C#.
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load("<path_to_html_file_or_url>");

// get all image elements having ALT tag inside div element with id=content
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//div[@id='content']//img[@alt]");

Open in new window


Just my 2 cents.
Here's a (somewhat) manual method using Notepad++ (N++)

- Backup your files
- Open your files in N++
- Ctrl-F

Find tab...
Find what: (<div.+alt=")(.+)(".+)
Regular expression
Wrap around

Mark tab...
Bookmark line
Wrap around
Mark all
This marks all relevant lines.

From menu...Search, Bookmark, Remove unmarked lines

1.
Find tab...
Find what: (<div.+alt=")(.+)(".+)
Regular expression
Wrap around

2.
Replace tab...
Replace with: \2
Replace all

3.
If that works satisfactorily, you can use the Macro to record the sequence...
From menu...Macro, Start recording.
Do 1.
Do 2.
Stop recording.
Save current recorded macro to a name

4.
Pick each file.
Run macro: Macro, pick saved name
Avatar of rgarimella
rgarimella

ASKER

Pham, is it possible to enter the code in DreamWeaver ? or would I need Visual Studio to test this code?
ASKER CERTIFIED SOLUTION
Avatar of Lucas Bishop
Lucas Bishop
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial