StMike38
asked on
How may validation of HTML links be automated?
I have a list of HTML links in a text file, one link per line. Links are of two kinds. One type is to a page, for example <a href="http://wtov9.com/weather/">whatever</a>. The other type executes a program, as in <a href="http://www.MySite.com/cgi-bin/MyProgram.exe?term1=something&term2=or+other&Submit=&">whatever</a>.
Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.
Question 1: Is there any way to automate this task using a browser such as Firefox?
Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.
Question 1: Is there any way to automate this task using a browser such as Firefox?
Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If you insist to write your own code for that, you could use
System.Net.WebClient, WebClient.DownloadString and WebException for details on these Objects.
for that - in case of problems it throws a WebException which contains all necessary info about the error. See the MSDN info on
Addendum on WGET: If you like to check the full correctness of the linked content, you could do a recurssive d/l, too, until a given level and even exclude certain types of data - it is even capable of doing a complete mirror of a site.
ASKER
Two excellent solutions, offered very quickly. Thank you.
StMike38
StMike38