Extracting ALT Text from Multiple HTML Pages


I was trying to find out how I can :

Extract ALT Text from an IMG Tag, inside a identified div
There are multiple HTML files

Example, I need to ONLY extract from the following DIV named content. There are other divs and ALT text on the page.

<div id ="content"><img src="../../images/nameofimage.jpg" alt="Tool Box"></div>

Any tool or technology will work (regex or DW or Pearl)
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Duy PhamFreelance IT ConsultantCommented:
It might not be related to this topics, but you can easily do that using HtmlAgilityPack in C#.
HtmlDocument htmlDoc = new HtmlDocument();

// get all image elements having ALT tag inside div element with id=content
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//div[@id='content']//img[@alt]");

Open in new window

Just my 2 cents.
NVITEnd-user supportCommented:
Here's a (somewhat) manual method using Notepad++ (N++)

- Backup your files
- Open your files in N++
- Ctrl-F

Find tab...
Find what: (<div.+alt=")(.+)(".+)
Regular expression
Wrap around

Mark tab...
Bookmark line
Wrap around
Mark all
This marks all relevant lines.

From menu...Search, Bookmark, Remove unmarked lines

Find tab...
Find what: (<div.+alt=")(.+)(".+)
Regular expression
Wrap around

Replace tab...
Replace with: \2
Replace all

If that works satisfactorily, you can use the Macro to record the sequence...
From menu...Macro, Start recording.
Do 1.
Do 2.
Stop recording.
Save current recorded macro to a name

Pick each file.
Run macro: Macro, pick saved name
rgarimellaAuthor Commented:
Pham, is it possible to enter the code in DreamWeaver ? or would I need Visual Studio to test this code?
Lucas BishopClick TrackerCommented:
You could build a basic scraper for this specific div tag using Kimono Labs.

It's fairly straightforward system if you watch this video.

Here is their Chrome extension. You'll want to switch it into the "Data Model" view, so you can access the code of the site, instead of the rendered view. Here is a basic overview on how to extract html elements.

I've used this before to do something similar where I was pulling a specific ASIN number from Amazon search results to identify search rankings of specific products. Made it very easy to deep dive into Amazon's rankings without having to visit the site at all.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Duy PhamFreelance IT ConsultantCommented:
@rgarimella:  Dreamweaver? I suppose that you are doing the extraction from inside a website/web application. Then you can use jQuery with the same simplicity as above HtmlAgilityPack:
            // get html content from an url
                method: 'GET',   // using POST if needed
                url: '<url_to_extract_alt_texts>',
                contentType: 'html',
                success: function(result) {
                    $(result).find('div[id="content"] img[alt]').each(function (idx, obj) {
                        // do something with extracted alt texts

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.