Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 253
  • Last Modified:

How may validation of HTML links be automated?

I have a list of HTML links in a text file, one link per line. Links are of two kinds. One type is  to a page, for example <a href="http://wtov9.com/weather/">whatever</a>. The other type executes a program, as in <a href="http://www.MySite.com/cgi-bin/MyProgram.exe?term1=something&term2=or+other&Submit=&">whatever</a>.

Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is  which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.

Question 1: Is there any way to automate this task using a browser such as Firefox?

Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
0
StMike38
Asked:
StMike38
  • 3
2 Solutions
 
frankhelkCommented:
As a lazy person, I would try WGET for that.

It's a command line tool from the Unix world but has been ported to windows. (see here and here and here). You could use it for downloading web or ftp content from within batch files. It  could read URLs from a text file one by one and - as far as I remember - even check the availability w/o downloading it or redirect the D/L to the null device (NUL: or /dev/null).

I think with some batch file magic around it for checking the return codes or analyzing the log that would be the perfect tool for your task.
0
 
Dave BaldwinFixer of ProblemsCommented:
I use the LinkChecker Addon (http://frayd.us/) for Firefox to check the links on my pages.  Of course, you have to put it in a page that Firefox can display for that to work.  LinkChecker color codes the links as it checks them.  Note that bad links can take up to 60 seconds or whatever your network timeout is before they are marked in red.
0
 
frankhelkCommented:
If you insist to write your own code for that, you could use
System.Net.WebClient().DownloadString(url))
for that - in case of problems it throws a WebException which contains all necessary info about the error. See the MSDN info on System.Net.WebClient, WebClient.DownloadString  and WebException for details on these Objects.
0
 
frankhelkCommented:
Addendum on WGET: If you like to check the full correctness of the linked content, you could do a recurssive d/l, too, until a given level and even exclude certain types of data - it is even capable of doing a complete mirror of a site.
0
 
StMike38Author Commented:
Two excellent solutions, offered very quickly. Thank you.

StMike38
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now