Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Perl or Shell script to sieve thru thousands of mail files & extract out addresses & failure reason from failed mails only

Posted on 2011-03-10
10
Medium Priority
?
450 Views
Last Modified: 2012-05-11
I have thousands of mail files & in these files, there are
some mails which fail to be sent & some which are Ok
(ie get to the destination).

The attached is one of the thousands of mail files & below are
lines of text which we'll need a Perl or Shell script to search for,
extract out & output into a file in the format below :

xia0h3i-@hotmail.com;Subject: Delivery Status Notification (Failure):
tec_kid@hotmail.com;Subject: Delivery Status Notification (Failure)
life.breathe@hotmail.com;Subject: Mail delivery failed: returning message to sender


Briefly, the search algorithm is as follows :

a)search each file for a line containing both the search strings
   "he following"  &  "failed" & 2 lines below it is the problem email
   address - extract out this email address & output to a file in first
   column followed by ; (semi-colon as column separator)

b)then search a few lines (it's variable number of lines) backwards
   for the string "Subject:"  & extract out this line & add it into the 2nd
   column of the file

c)then repeat step (a) above to search forward for the next line with the
   search strings "he following" & "failed" for subsequent extractions till
   the end of the file & then proceed to do the same for the next mail file
   (all mail files have either 4 or 5 digits as their filenames & all of them
    are in one directory)


======== key search strings / text: extracted from the attachment =========

Subject: Delivery Status Notification (Failure):
. . . . .

Delivery to the following recipients failed.

       xia0h3i-@hotmail.com

.......


Subject: Delivery Status Notification (Failure)
. . . . .
Delivery to the following recipients failed.

       tec_kid@hotmail.com
..........

Subject: Mail delivery failed: returning message to sender
. . . . .
recipients. This is a permanent error. The following address(es) failed:

  life.breathe@hotmail.com

..........


j.txt
0
Comment
Question by:sunhux
  • 5
  • 4
10 Comments
 
LVL 27

Assisted Solution

by:wilcoxon
wilcoxon earned 2000 total points
ID: 35096683
This should do what you want.  To call the script:

script.pl file1 file2 file3 ... > output_file
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b.*\s+failed\b/) {
            $in_err++;
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                print "$1; $subj\n";
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

Open in new window

0
 

Author Comment

by:sunhux
ID: 35096691

3 corrections / requirements to what I posted above :

1) remove "failed" from the search string as I came across one line
    below which does not have the string "failed" in it :
   "The following addresses had permanent delivery errors"


2) Came across one example below where search string
     "he following" & the email addr to be extracted are on
     the same line : if this 2nd point can't be achieved by the
     same script, feel free to write a separate script

3) After extraction, I would like to sort the output by the 2nd
    column of the output as primary key & 1st column as the
    secondary sort key (remember it's a ;/semicolon separated
    columns
Subject: Delivery Status Notification (Failure)
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status; boundary="onJ2O.4dPFG2WYz.1+z937.67NgUs8"
. . . . .

The following message to <kayn73@starhub.net.sg> was undeliverable.
The reason for the problem:
......
0
 

Author Comment

by:sunhux
ID: 35096726

Wow that's fast Wilcoxon.  However, I've made some corrections to my
earlier requirements, sorry about that, hope you can amend the script.


Lastly, can I run the script by passing the * wildcard character to represent
all files in that directory,  ie :
    script.pl * > output_file
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 31

Expert Comment

by:farzanj
ID: 35096729
I looked through your example.  There is no "he following"  &  "failed".  I searched the attached file.  Could you give a file that contains the terms you need.

Second, your algorithm is a little confusing.  A little simpler would be appreciated.
0
 
LVL 27

Accepted Solution

by:
wilcoxon earned 2000 total points
ID: 35096794
To take into account your changes....

This modified version should handle everything...
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err, %bad);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b/) {
            if (/\b(\w\S+@[\w\.]+)\b/) {
                $bad{$subj}{$1}++;
            } else {
                $in_err++;
            }
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                $bad{$subj}{$1}++;
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

foreach my $subj (sort keys %bad) {
    foreach my $addr (sort keys %{$bad{$subj}}) {
        print "$addr; $subj\n";
    }
}

Open in new window

0
 
LVL 27

Assisted Solution

by:wilcoxon
wilcoxon earned 2000 total points
ID: 35096807
Yes, you can run it as script.pl * > output_file.
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 35096820
farzanj, not sure why you're not finding it - "he following" and "failure" occur 3 times in j.txt (as expected since there are 3 failures in the log).
0
 

Author Comment

by:sunhux
ID: 35096842


The attachment is the actual file & to re-illustrate , I repost those
lines with the search strings (those underlined by ^^^) below :

Delivery to the following recipients failed
                   ^^^^^^^^^^^^                

recipients. This is a permanent error. The following address(es) failed:
                                                              ^^^^^^^^^^^^

The following message to <kayn73@starhub.net.sg> was undeliverable.
  ^^^^^^^^^^^


"he" is a substring of both "The" & "the" (but since it can be sometimes capital
T & sometimes small t,  I indicated the search string as "he following" )
0
 

Author Comment

by:sunhux
ID: 35096856

I'll test it out tomorrow : it's now 1am my time
0
 

Author Closing Comment

by:sunhux
ID: 35108046
Marvellous
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Over the years I've spent many an hour playing on hardened, DMZ'd servers, with only a sub-set of the usual GNU toy's to keep me company; frequently I've needed to save and send log or data extracts from these server back to my PC, or to others, and…
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

824 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question