Link to home
Start Free TrialLog in
Avatar of StMike38
StMike38Flag for United States of America

asked on

How may validation of HTML links be automated?

I have a list of HTML links in a text file, one link per line. Links are of two kinds. One type is  to a page, for example <a href="http://wtov9.com/weather/">whatever</a>. The other type executes a program, as in <a href="http://www.MySite.com/cgi-bin/MyProgram.exe?term1=something&term2=or+other&Submit=&">whatever</a>.

Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is  which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.

Question 1: Is there any way to automate this task using a browser such as Firefox?

Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
ASKER CERTIFIED SOLUTION
Avatar of Frank Helk
Frank Helk
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you insist to write your own code for that, you could use
System.Net.WebClient().DownloadString(url))
for that - in case of problems it throws a WebException which contains all necessary info about the error. See the MSDN info on System.Net.WebClient, WebClient.DownloadString  and WebException for details on these Objects.
Addendum on WGET: If you like to check the full correctness of the linked content, you could do a recurssive d/l, too, until a given level and even exclude certain types of data - it is even capable of doing a complete mirror of a site.
Avatar of StMike38

ASKER

Two excellent solutions, offered very quickly. Thank you.

StMike38