Solved

How may validation of HTML links be automated?

Posted on 2014-09-24
5
203 Views
Last Modified: 2014-09-24
I have a list of HTML links in a text file, one link per line. Links are of two kinds. One type is  to a page, for example <a href="http://wtov9.com/weather/">whatever</a>. The other type executes a program, as in <a href="http://www.MySite.com/cgi-bin/MyProgram.exe?term1=something&term2=or+other&Submit=&">whatever</a>.

Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is  which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.

Question 1: Is there any way to automate this task using a browser such as Firefox?

Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
0
Comment
Question by:StMike38
  • 3
5 Comments
 
LVL 13

Accepted Solution

by:
frankhelk earned 250 total points
Comment Utility
As a lazy person, I would try WGET for that.

It's a command line tool from the Unix world but has been ported to windows. (see here and here and here). You could use it for downloading web or ftp content from within batch files. It  could read URLs from a text file one by one and - as far as I remember - even check the availability w/o downloading it or redirect the D/L to the null device (NUL: or /dev/null).

I think with some batch file magic around it for checking the return codes or analyzing the log that would be the perfect tool for your task.
0
 
LVL 82

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 250 total points
Comment Utility
I use the LinkChecker Addon (http://frayd.us/) for Firefox to check the links on my pages.  Of course, you have to put it in a page that Firefox can display for that to work.  LinkChecker color codes the links as it checks them.  Note that bad links can take up to 60 seconds or whatever your network timeout is before they are marked in red.
0
 
LVL 13

Expert Comment

by:frankhelk
Comment Utility
If you insist to write your own code for that, you could use
System.Net.WebClient().DownloadString(url))
for that - in case of problems it throws a WebException which contains all necessary info about the error. See the MSDN info on System.Net.WebClient, WebClient.DownloadString  and WebException for details on these Objects.
0
 
LVL 13

Expert Comment

by:frankhelk
Comment Utility
Addendum on WGET: If you like to check the full correctness of the linked content, you could do a recurssive d/l, too, until a given level and even exclude certain types of data - it is even capable of doing a complete mirror of a site.
0
 

Author Closing Comment

by:StMike38
Comment Utility
Two excellent solutions, offered very quickly. Thank you.

StMike38
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I didn’t use eM Client for long when I decided to swap to Outlook 2016. The reason for the switch is that it started asking for payment to continue some of its services after one month.   The problems I faced when I didn’t pay were:   I was not …
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
In this Experts Exchange video Micro Tutorial, I'm going to show how small business owners who use Google Apps can save money by setting up what is called a catch-all email address in their Gmail accounts. By using the catch-all feature, small busin…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now