Solved

How may validation of HTML links be automated?

Posted on 2014-09-24
5
217 Views
Last Modified: 2014-09-24
I have a list of HTML links in a text file, one link per line. Links are of two kinds. One type is  to a page, for example <a href="http://wtov9.com/weather/">whatever</a>. The other type executes a program, as in <a href="http://www.MySite.com/cgi-bin/MyProgram.exe?term1=something&term2=or+other&Submit=&">whatever</a>.

Suppose there are a couple of hundred such links, and I do not want to check them by pasting them one at a time into a browser. Suppose further that this is a high frequency task that should require minimal human involvement. All I need to know is  which (if any) of the links are invalid, that is, instances where the result would be a 404 "not found" or other failure message.

Question 1: Is there any way to automate this task using a browser such as Firefox?

Question 2: If "No" to question 1, is there a technique that I could use within a C++ program that I write to accomplish the same goal? What functions would be involved? I am using Windows 7 Professional and Visual Studio 2010. Ideally this would be a simple console program launched on a PC from the command line which reads in the list of links from a text file and reports the results in a new text file.
0
Comment
Question by:StMike38
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 14

Accepted Solution

by:
frankhelk earned 250 total points
ID: 40341744
As a lazy person, I would try WGET for that.

It's a command line tool from the Unix world but has been ported to windows. (see here and here and here). You could use it for downloading web or ftp content from within batch files. It  could read URLs from a text file one by one and - as far as I remember - even check the availability w/o downloading it or redirect the D/L to the null device (NUL: or /dev/null).

I think with some batch file magic around it for checking the return codes or analyzing the log that would be the perfect tool for your task.
0
 
LVL 83

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 250 total points
ID: 40341822
I use the LinkChecker Addon (http://frayd.us/) for Firefox to check the links on my pages.  Of course, you have to put it in a page that Firefox can display for that to work.  LinkChecker color codes the links as it checks them.  Note that bad links can take up to 60 seconds or whatever your network timeout is before they are marked in red.
0
 
LVL 14

Expert Comment

by:frankhelk
ID: 40341893
If you insist to write your own code for that, you could use
System.Net.WebClient().DownloadString(url))
for that - in case of problems it throws a WebException which contains all necessary info about the error. See the MSDN info on System.Net.WebClient, WebClient.DownloadString  and WebException for details on these Objects.
0
 
LVL 14

Expert Comment

by:frankhelk
ID: 40341905
Addendum on WGET: If you like to check the full correctness of the linked content, you could do a recurssive d/l, too, until a given level and even exclude certain types of data - it is even capable of doing a complete mirror of a site.
0
 

Author Closing Comment

by:StMike38
ID: 40341950
Two excellent solutions, offered very quickly. Thank you.

StMike38
0

Featured Post

Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Zimbra is famous for its platform independency, ability to manage multiple user accounts, easy assimilation with 3rd party applications, social network certification etc. Here, we discuss about how users can move multiple Zimbra user accounts to Exc…
Unified and professional email signatures help maintain a consistent company brand image to the outside world. This article shows how to create an email signature in Exchange Server 2010 using a transport rule and how to overcome native limitations …
In this Experts Exchange video Micro Tutorial, I'm going to show how small business owners who use Google Apps can save money by setting up what is called a catch-all email address in their Gmail accounts. By using the catch-all feature, small busin…
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question