• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1495
  • Last Modified:

How to verify a URL or email address?

I have a couple of different tables in my SQL 7 database.  One has a list of links (URLs), the other has a list of email addresses.  I would like to write scripts that verify that an address is a valid one, and I'm really not sure what objects would present a potential solution.  When I say valid address I mean that the address is not rejected by a server (by valid I do not mean 'legally correct'), so I suspect that the procedure would evaluate the response from a server.

An acceptable answer need not include any details on accessing the data; this is not a database question.  However the answer should include either a code sample or a link to a page with a solution.
0
flying_squirrel
Asked:
flying_squirrel
  • 5
  • 5
  • 3
  • +2
1 Solution
 
flying_squirrelAuthor Commented:
Edited text of question.
0
 
xabiCommented:
Url verification is a little harder to do cause you must think that an url can be something like:

http://www.foo.com/~john/t.asp?v1=nn%20pp

xabi
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
mgfranzCommented:
Xabi is correct, plus you also have to remember that the URL can take a couple of paths to get where it wants to go, for example;

foo.com
www.foo.com
http://foo.com
http://www.foo.com

Everyone of these will get to the same location.

When I have done this for Guestbooks in the past, I leave it up to the user to correctly insert there home URL, if a bad link is found, it will be deleted.  But in 99% of the cases, the user is pretty happy to input correct paths.

Mark
0
 
flying_squirrelAuthor Commented:
Hmmm, I guess I didn't state my question in a clear enough manner.

I do not want to validate that a URL or email address is in the correct format (I'm already doing that).

What I wish to do is to communicate with the web or email server to -verify- that the URL or email address actually does exist.
0
 
mgfranzCommented:
So you want to 'ping' the URL?  There are a few components that will do that;

http://www.hexillion.com/support/docs/HexIcmp/

Or a free component;

http://www.15seconds.com/component/pg002179.htm
0
 
flying_squirrelAuthor Commented:
OK, we're on the right track, but pinging (as I understand it) only works with domains and IP addresses.  If I try to ping a particular page at a domain or IP address, ping doesn't know how to resolve the request, because it is only interested in the host.
0
 
mgfranzCommented:
True, I don't know of any way to validate an actual URL path to a location?  

Mark
0
 
mayhewCommented:
Mark, I'm sure you know more about this than I do, but couldn't you set up a process that acts like it's going to load the URL in question, check the return code in the header and then flush the buffer and redirect so that the page doesn't display (or something like that)?

That's completely made up, i.e. I don't if could be done.  But it makes sense to me.

Any thoughts anybody?
0
 
clockwatcherCommented:
URL checking is fairly easy as mayhew suggested-- although depending upon the ebb and flow of the internet you may actually mark a site as no good that is simply unreachable at the moment.  It wouldn't necessarily be something you'd do an ASP process, however it easily could be.  Check out http://www.serverobjects.com for their ASPHttp component or check tech.dimac.net for their Winsock wrapper component.  Unless, you have a good reason to use ASP for this (and I don't see one from what you posted), you could simply use the WebBrowser control or the InternetTransferControl (if you have VB).  It also could be done easily with a few lines in perl.

Mail checking is not easy.  Actually, mail checking is pretty much impossible, as you have to wait for an undeliverable response-- and depending upon how the mail server is setup it may not even send one or it may send one days later-- plus the return format is not defined.  In other words, it's a problem not easily solved.  There's no way to query a mail server to tell you whether that account exists or not, for good reason-- the mail server may be a hub for multiple sub-domains each with their own drops, and it simply performs periodic drop-offs.  Mail is routed, its first stop may not be its final destination.
0
 
mgfranzCommented:
The Perl LWP or Win32::Internet module would do a URL parse and return a page as a string, I imagine it would be easy to read the contents of the returned string for a 404, 401, 500, 403, etc..., if so, then flag as bad.  Of course you would only want to read in 50 or so characters.  But then this would be a moot point too if the server has set-up custom error pages.

Mark
0
 
clockwatcherCommented:
Actually, you only need to check the header.  Almost all servers (actually haven't seen one that doesn't), use the correct response code even if they end up displaying or redirecting to a custom error page.

use LWP::Simple;

print head("http://www.experts-exchange.com") ? "Okay" : "Not Valid" ;
0
 
mgfranzCommented:
This Perl code could run as a batch job once a day maybe....  Use the DBI module to work the dB, or the Win32::ODBC, Flag the ones that are bad...

Hey, I think were on to something... :-)
0
 
mayhewCommented:
Sounds like an idea for a product.  ;)
0
 
flying_squirrelAuthor Commented:
I have tried the ASPHttp component and it is the solution to the URL side of my problem.  

As for why I'm using ASP: I'm not sure I understand what the disadvantages are, I don't have VB, and I don't have enough experience with PERL to pull it off quickly.

I still don't have an answer to the mail question (anyone?), and I'm not convinced it is as difficult a problem as you seem to perceive it to be.  I say this because I've downloaded shareware from Elcom that verifies email addresses in just the way I've described (in real time), so there definitely is a way (see http://www.elcomsoft.com/amv.html).  I would just like to have the checking done on the server side, automatically.

Anyway, thank you all for the input.

Mike
0
 
clockwatcherCommented:
I think they're highly exaggerating their success rate.

  'AMV can find about 90% of dead addresses - some mail systems receive all messages and only then see their addresses and if the address is dead send the message back with remark about it.'

In my experience, most (not some) mail system's actually do the above, for the reason I stated.  If you want to do what the above does, simply connect on port 25 and use SMTP protocol-- Use the tech.dimac.net socket component.  Again perl would be much easier.

Algorithmically:

Figure out the domain of the email address.
Query a DNS server for the MX record of that domain to determine the SMTP server
Connect to that SMTP server on port 25
Send VRFY someone@somewhere.com (if it's supported, many servers have it disabled for security reasons)
if VRFY is not supported, try sending RCPT TO:<email@somewhere.com>
Evaluate Response
  250:  okay user exists
  251:  user not local; will forward to <this-server>
  252:  Cannot vrfy, but will try delivery anyway
  550:  no matching user
  551:  user not local; please try <this-server>

Now if you get a 251, you may try connecting to <this-server> yourself, but it may only allow connections from known servers-- same thing with a 551.  If you get a 550, you can remove the user.

For full details on SMTP, check out RFC-821

http://www.cis.ohio-state.edu/htbin/rfc/rfc821.html
0
 
flying_squirrelAuthor Commented:
WOW, detailed and specific.  Normally I consider myself reasonably expert, until I see answers like this.

Thanks, clockwatcher, for the great answer.  OK, I'm convinced that it's slightly more complicated than I thought.

;)

Mike
0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

  • 5
  • 5
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now