Solved

Perl or Shell script to sieve thru thousands of mail files & extract out addresses & failure reason from failed mails only

Posted on 2011-03-10
10
429 Views
Last Modified: 2012-05-11
I have thousands of mail files & in these files, there are
some mails which fail to be sent & some which are Ok
(ie get to the destination).

The attached is one of the thousands of mail files & below are
lines of text which we'll need a Perl or Shell script to search for,
extract out & output into a file in the format below :

xia0h3i-@hotmail.com;Subject: Delivery Status Notification (Failure):
tec_kid@hotmail.com;Subject: Delivery Status Notification (Failure)
life.breathe@hotmail.com;Subject: Mail delivery failed: returning message to sender


Briefly, the search algorithm is as follows :

a)search each file for a line containing both the search strings
   "he following"  &  "failed" & 2 lines below it is the problem email
   address - extract out this email address & output to a file in first
   column followed by ; (semi-colon as column separator)

b)then search a few lines (it's variable number of lines) backwards
   for the string "Subject:"  & extract out this line & add it into the 2nd
   column of the file

c)then repeat step (a) above to search forward for the next line with the
   search strings "he following" & "failed" for subsequent extractions till
   the end of the file & then proceed to do the same for the next mail file
   (all mail files have either 4 or 5 digits as their filenames & all of them
    are in one directory)


======== key search strings / text: extracted from the attachment =========

Subject: Delivery Status Notification (Failure):
. . . . .

Delivery to the following recipients failed.

       xia0h3i-@hotmail.com

.......


Subject: Delivery Status Notification (Failure)
. . . . .
Delivery to the following recipients failed.

       tec_kid@hotmail.com
..........

Subject: Mail delivery failed: returning message to sender
. . . . .
recipients. This is a permanent error. The following address(es) failed:

  life.breathe@hotmail.com

..........


j.txt
0
Comment
Question by:sunhux
  • 5
  • 4
10 Comments
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 500 total points
ID: 35096683
This should do what you want.  To call the script:

script.pl file1 file2 file3 ... > output_file
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b.*\s+failed\b/) {
            $in_err++;
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                print "$1; $subj\n";
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

Open in new window

0
 

Author Comment

by:sunhux
ID: 35096691

3 corrections / requirements to what I posted above :

1) remove "failed" from the search string as I came across one line
    below which does not have the string "failed" in it :
   "The following addresses had permanent delivery errors"


2) Came across one example below where search string
     "he following" & the email addr to be extracted are on
     the same line : if this 2nd point can't be achieved by the
     same script, feel free to write a separate script

3) After extraction, I would like to sort the output by the 2nd
    column of the output as primary key & 1st column as the
    secondary sort key (remember it's a ;/semicolon separated
    columns
Subject: Delivery Status Notification (Failure)
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status; boundary="onJ2O.4dPFG2WYz.1+z937.67NgUs8"
. . . . .

The following message to <kayn73@starhub.net.sg> was undeliverable.
The reason for the problem:
......
0
 

Author Comment

by:sunhux
ID: 35096726

Wow that's fast Wilcoxon.  However, I've made some corrections to my
earlier requirements, sorry about that, hope you can amend the script.


Lastly, can I run the script by passing the * wildcard character to represent
all files in that directory,  ie :
    script.pl * > output_file
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 31

Expert Comment

by:farzanj
ID: 35096729
I looked through your example.  There is no "he following"  &  "failed".  I searched the attached file.  Could you give a file that contains the terms you need.

Second, your algorithm is a little confusing.  A little simpler would be appreciated.
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 35096794
To take into account your changes....

This modified version should handle everything...
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err, %bad);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b/) {
            if (/\b(\w\S+@[\w\.]+)\b/) {
                $bad{$subj}{$1}++;
            } else {
                $in_err++;
            }
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                $bad{$subj}{$1}++;
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

foreach my $subj (sort keys %bad) {
    foreach my $addr (sort keys %{$bad{$subj}}) {
        print "$addr; $subj\n";
    }
}

Open in new window

0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 500 total points
ID: 35096807
Yes, you can run it as script.pl * > output_file.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35096820
farzanj, not sure why you're not finding it - "he following" and "failure" occur 3 times in j.txt (as expected since there are 3 failures in the log).
0
 

Author Comment

by:sunhux
ID: 35096842


The attachment is the actual file & to re-illustrate , I repost those
lines with the search strings (those underlined by ^^^) below :

Delivery to the following recipients failed
                   ^^^^^^^^^^^^                

recipients. This is a permanent error. The following address(es) failed:
                                                              ^^^^^^^^^^^^

The following message to <kayn73@starhub.net.sg> was undeliverable.
  ^^^^^^^^^^^


"he" is a substring of both "The" & "the" (but since it can be sometimes capital
T & sometimes small t,  I indicated the search string as "he following" )
0
 

Author Comment

by:sunhux
ID: 35096856

I'll test it out tomorrow : it's now 1am my time
0
 

Author Closing Comment

by:sunhux
ID: 35108046
Marvellous
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This Windows batch file is useful for organizing image files from a digital camera or other source, but can have many other uses.  It simply renames the file(s) to match their create date.  For example, if you took a picture today at 1:40pm and the …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question