SED Email Validation

Hi,

I have a large list of emails and I need a script that will validate the syntax, MX and DNS records. I run Ubuntu through a virtual box and ideally need a script I can execute on the command line.

Thank you in advance
faithless1Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

 
apresenceCommented:
This would be a bit of a pain to do in SED.  Here's how to do it in perl:
perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'

Returns 0 if the e-mail address is valid, or 1 if it's not.

Testing:
root@beta:~/exex $ echo foo | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
1
root@beta:~/exex $ echo foo@bar | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
1
root@beta:~/exex $ echo foo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
0
root@beta:~/exex $ echo 0oo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
0
root@beta:~/exex $ echo 0oo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
0
root@beta:~/exex $


If you want to go a step further and check for .com/.net/.org, etc. do this (Make sure EVERY domain suffix you want to allow is listed!  .biz and .tv etc...)
perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)*(\.com|\.net|\.org)$/)'

Testing:
root@beta:~/exex $ echo foo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)*(\.com|\.net|\.org)$/)'; echo $?
0
root@beta:~/exex $ echo foo@bar.foo | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)*(\.com|\.net|\.org)$/)'; echo $?
1
0
 
apresenceCommented:
The above just validates the format of the e-mail address, not the MX/DNS records... please ignore.
0
 
apresenceCommented:
Check this out for validating the MX/DNS records (again, perl not SED):
http://www.usenix.org/publications/perl/perl17.html

SED is really only suited for editing/validating text, not for doing things like querying domain servers.  Perl is a much better option for this.
        sub valid_address {
        	my($addr) = @_;
        	my($domain, $valid);
         	return(0) unless ($addr =~ /^[^@]+@([-\w]+\.)+[A-Za-z]
        					{2,4}$/);
        	$domain = (split(/@/, $addr))[1];
        	$valid = 0; open(DNS, "nslookup -q=any $domain |") ||
        					return(-1);
        	while (<DNS>) {
        		$valid = 1 if (/^$domain.*\s(mail exchanger|
        					internet address)\s=/);
        	}
        	return($valid);
        }

Open in new window

0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
faithless1Author Commented:
Thanks. How would I run these scripts on the command line in Gnome terminal? I have a file with 100k emails named (emails.txt). Thanks
0
 
apresenceCommented:
Drop the attached code into a file called validate_emails.pl.  Make sure to run "chmod 700 validate_emails.pl" to mark it as executable.

To process your text file and see the output, just use:
validate_emails.pl < emails.txt

To process your text file and save the output, just use:
validate_emails.pl < emails.txt > output.txt

Sample testing output:
root@beta:~/exex/test13 $ cat emails.txt
foo
foo@bar
foo@bar.baz
support@microsoft.com
root@beta:~/exex/test13 $ ./validate_emails.pl <emails.txt
foo invalid
foo@bar invalid
foo@bar.baz invalid
support@microsoft.com valid
root@beta:~/exex/test13 $
#!/usr/bin/perl

sub valid_address {
  my($addr) = @_;
  my($domain, $valid);
  return(0) unless ($addr =~ /^[^@]+@([-\w]+\.)+[A-Za-z]{2,4}$/);
  $domain = (split(/@/, $addr))[1];
  $valid = 0; open(DNS, "nslookup -q=any $domain |") || return(-1);
  while (<DNS>) {
    $valid = 1 if (/^$domain.*\s(mail exchanger|internet address)\s=/);
  }
  return($valid);
}

while (<>) {
  $addy = $_;
  $addy =~ s/\s+$//;
  if ($addy)
  {
    print "$addy " . (valid_address($addy) ? 'valid' : 'invalid') . "\n";
  }
}

Open in new window

0
 
apresenceCommented:
Since you are going to be checking 100k e-mails, there is almost certainly going to be some duplicate lookups.  In order to optimize the lookups, I've attached a new version of the script that caches the results of the last lookup.  Should make your script run faster.

Uncomment the following like to get the cache hit information:
#print "[cached_result] ";

Testing with that line uncommented:
root@beta:~/exex/test13 $ cat emails.txt
foo
foo@bar
bar@bar
foo@barific.baz
bar@barific.baz
support@microsoft.com
abuse@microsoft.com
root@beta:~/exex/test13 $ ./validate_emails.pl <emails.txt
foo invalid
foo@bar invalid
bar@bar invalid
foo@barific.baz invalid
[cached_result] bar@barific.baz invalid
support@microsoft.com valid
[cached_result] abuse@microsoft.com valid
root@beta:~/exex/test13 $
#!/usr/bin/perl
use Data::Dumper;

%lookup_cache = ();

sub valid_address {
  my($addr) = @_;
  my($domain, $valid);

  # Lower-case address
  $addr = lc($addr);

  # Validate format of address
  return(0) unless ($addr =~ /^[^@]+@([-\w]+\.)+[a-z]{2,4}$/);

  # Grab domain
  $domain = (split(/@/, $addr))[1];

  # Lookup and return cached result if it exists
  $cached_result = $lookup_cache{$domain};
  if ($cached_result ne '')
  {
    #print "[cached_result] ";
    return $cached_result;
  }

  # Do domain lookup
  $valid = 0;
  if (open(DNS, "nslookup -q=any $domain |"))
  {
    while (<DNS>) {
      $valid = 1 if (/^$domain.*\s(mail exchanger|internet address)\s=/i);
    }
  }

  # Store cached result for later
  $lookup_cache{$domain} = $valid;

  return $valid;
}

while (<>) {
  $addy = $_;
  $addy =~ s/\s+$//;
  if ($addy)
  {
    print "$addy " . (valid_address($addy) ? 'valid' : 'invalid') . "\n";
  }
}

Open in new window

0

Experts Exchange Solution brought to you by ConnectWise

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
 
faithless1Author Commented:
Superb, thanks a million! I appreciate it.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.