Solved

SED Email Validation

Posted on 2010-09-05
7
778 Views
Last Modified: 2013-12-26
Hi,

I have a large list of emails and I need a script that will validate the syntax, MX and DNS records. I run Ubuntu through a virtual box and ideally need a script I can execute on the command line.

Thank you in advance
0
Comment
Question by:faithless1
  • 5
  • 2
7 Comments
 
LVL 6

Expert Comment

by:apresence
ID: 33609005
This would be a bit of a pain to do in SED.  Here's how to do it in perl:
perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'

Returns 0 if the e-mail address is valid, or 1 if it's not.

Testing:
root@beta:~/exex $ echo foo | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
1
root@beta:~/exex $ echo foo@bar | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
1
root@beta:~/exex $ echo foo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
0
root@beta:~/exex $ echo 0oo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
0
root@beta:~/exex $ echo 0oo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)+$/)'; echo $?
0
root@beta:~/exex $


If you want to go a step further and check for .com/.net/.org, etc. do this (Make sure EVERY domain suffix you want to allow is listed!  .biz and .tv etc...)
perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)*(\.com|\.net|\.org)$/)'

Testing:
root@beta:~/exex $ echo foo@bar.com | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)*(\.com|\.net|\.org)$/)'; echo $?
0
root@beta:~/exex $ echo foo@bar.foo | perl -ne 'exit 1 if (!/^\w[\w\-\_\.]*\@\w[\w\-\_]*(\.\w[\w\-\_]*)*(\.com|\.net|\.org)$/)'; echo $?
1
0
 
LVL 6

Expert Comment

by:apresence
ID: 33609010
The above just validates the format of the e-mail address, not the MX/DNS records... please ignore.
0
 
LVL 6

Expert Comment

by:apresence
ID: 33609026
Check this out for validating the MX/DNS records (again, perl not SED):
http://www.usenix.org/publications/perl/perl17.html

SED is really only suited for editing/validating text, not for doing things like querying domain servers.  Perl is a much better option for this.
        sub valid_address {

        	my($addr) = @_;

        	my($domain, $valid);

         	return(0) unless ($addr =~ /^[^@]+@([-\w]+\.)+[A-Za-z]

        					{2,4}$/);

        	$domain = (split(/@/, $addr))[1];

        	$valid = 0; open(DNS, "nslookup -q=any $domain |") ||

        					return(-1);

        	while (<DNS>) {

        		$valid = 1 if (/^$domain.*\s(mail exchanger|

        					internet address)\s=/);

        	}

        	return($valid);

        }

Open in new window

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:faithless1
ID: 33614266
Thanks. How would I run these scripts on the command line in Gnome terminal? I have a file with 100k emails named (emails.txt). Thanks
0
 
LVL 6

Expert Comment

by:apresence
ID: 33614345
Drop the attached code into a file called validate_emails.pl.  Make sure to run "chmod 700 validate_emails.pl" to mark it as executable.

To process your text file and see the output, just use:
validate_emails.pl < emails.txt

To process your text file and save the output, just use:
validate_emails.pl < emails.txt > output.txt

Sample testing output:
root@beta:~/exex/test13 $ cat emails.txt
foo
foo@bar
foo@bar.baz
support@microsoft.com
root@beta:~/exex/test13 $ ./validate_emails.pl <emails.txt
foo invalid
foo@bar invalid
foo@bar.baz invalid
support@microsoft.com valid
root@beta:~/exex/test13 $
#!/usr/bin/perl



sub valid_address {

  my($addr) = @_;

  my($domain, $valid);

  return(0) unless ($addr =~ /^[^@]+@([-\w]+\.)+[A-Za-z]{2,4}$/);

  $domain = (split(/@/, $addr))[1];

  $valid = 0; open(DNS, "nslookup -q=any $domain |") || return(-1);

  while (<DNS>) {

    $valid = 1 if (/^$domain.*\s(mail exchanger|internet address)\s=/);

  }

  return($valid);

}



while (<>) {

  $addy = $_;

  $addy =~ s/\s+$//;

  if ($addy)

  {

    print "$addy " . (valid_address($addy) ? 'valid' : 'invalid') . "\n";

  }

}

Open in new window

0
 
LVL 6

Accepted Solution

by:
apresence earned 500 total points
ID: 33614506
Since you are going to be checking 100k e-mails, there is almost certainly going to be some duplicate lookups.  In order to optimize the lookups, I've attached a new version of the script that caches the results of the last lookup.  Should make your script run faster.

Uncomment the following like to get the cache hit information:
#print "[cached_result] ";

Testing with that line uncommented:
root@beta:~/exex/test13 $ cat emails.txt
foo
foo@bar
bar@bar
foo@barific.baz
bar@barific.baz
support@microsoft.com
abuse@microsoft.com
root@beta:~/exex/test13 $ ./validate_emails.pl <emails.txt
foo invalid
foo@bar invalid
bar@bar invalid
foo@barific.baz invalid
[cached_result] bar@barific.baz invalid
support@microsoft.com valid
[cached_result] abuse@microsoft.com valid
root@beta:~/exex/test13 $
#!/usr/bin/perl

use Data::Dumper;



%lookup_cache = ();



sub valid_address {

  my($addr) = @_;

  my($domain, $valid);



  # Lower-case address

  $addr = lc($addr);



  # Validate format of address

  return(0) unless ($addr =~ /^[^@]+@([-\w]+\.)+[a-z]{2,4}$/);



  # Grab domain

  $domain = (split(/@/, $addr))[1];



  # Lookup and return cached result if it exists

  $cached_result = $lookup_cache{$domain};

  if ($cached_result ne '')

  {

    #print "[cached_result] ";

    return $cached_result;

  }



  # Do domain lookup

  $valid = 0;

  if (open(DNS, "nslookup -q=any $domain |"))

  {

    while (<DNS>) {

      $valid = 1 if (/^$domain.*\s(mail exchanger|internet address)\s=/i);

    }

  }



  # Store cached result for later

  $lookup_cache{$domain} = $valid;



  return $valid;

}



while (<>) {

  $addy = $_;

  $addy =~ s/\s+$//;

  if ($addy)

  {

    print "$addy " . (valid_address($addy) ? 'valid' : 'invalid') . "\n";

  }

}

Open in new window

0
 

Author Comment

by:faithless1
ID: 33621650
Superb, thanks a million! I appreciate it.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Utilizing an array to gracefully append to a list of EmailAddresses
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now