How may validation of HTML links be automated?

I have a list of HTML links in a text file, one link per line. Links are of two kinds. One type is  to a page, for example <a href="http://wtov9.com/weather/">whatever</a>. The other type executes a program, as in <a href="http://www.MySite.com/cgi-bin/MyProgram.exe?term1=something&term2=or+other&Submit=&">whatever</a>.

Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is  which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.

Question 1: Is there any way to automate this task using a browser such as Firefox?

Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
StMike38Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

frankhelkCommented:
As a lazy person, I would try WGET for that.

It's a command line tool from the Unix world but has been ported to windows. (see here and here and here). You could use it for downloading web or ftp content from within batch files. It  could read URLs from a text file one by one and - as far as I remember - even check the availability w/o downloading it or redirect the D/L to the null device (NUL: or /dev/null).

I think with some batch file magic around it for checking the return codes or analyzing the log that would be the perfect tool for your task.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Dave BaldwinFixer of ProblemsCommented:
I use the LinkChecker Addon (http://frayd.us/) for Firefox to check the links on my pages.  Of course, you have to put it in a page that Firefox can display for that to work.  LinkChecker color codes the links as it checks them.  Note that bad links can take up to 60 seconds or whatever your network timeout is before they are marked in red.
0
frankhelkCommented:
If you insist to write your own code for that, you could use
System.Net.WebClient().DownloadString(url))
for that - in case of problems it throws a WebException which contains all necessary info about the error. See the MSDN info on System.Net.WebClient, WebClient.DownloadString  and WebException for details on these Objects.
0
frankhelkCommented:
Addendum on WGET: If you like to check the full correctness of the linked content, you could do a recurssive d/l, too, until a given level and even exclude certain types of data - it is even capable of doing a complete mirror of a site.
0
StMike38Author Commented:
Two excellent solutions, offered very quickly. Thank you.

StMike38
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Quality Assurance

From novice to tech pro — start learning today.