?
Solved

Perl or Shell script to sieve thru thousands of mail files & extract out addresses & failure reason from failed mails only

Posted on 2011-03-10
10
Medium Priority
?
445 Views
Last Modified: 2012-05-11
I have thousands of mail files & in these files, there are
some mails which fail to be sent & some which are Ok
(ie get to the destination).

The attached is one of the thousands of mail files & below are
lines of text which we'll need a Perl or Shell script to search for,
extract out & output into a file in the format below :

xia0h3i-@hotmail.com;Subject: Delivery Status Notification (Failure):
tec_kid@hotmail.com;Subject: Delivery Status Notification (Failure)
life.breathe@hotmail.com;Subject: Mail delivery failed: returning message to sender


Briefly, the search algorithm is as follows :

a)search each file for a line containing both the search strings
   "he following"  &  "failed" & 2 lines below it is the problem email
   address - extract out this email address & output to a file in first
   column followed by ; (semi-colon as column separator)

b)then search a few lines (it's variable number of lines) backwards
   for the string "Subject:"  & extract out this line & add it into the 2nd
   column of the file

c)then repeat step (a) above to search forward for the next line with the
   search strings "he following" & "failed" for subsequent extractions till
   the end of the file & then proceed to do the same for the next mail file
   (all mail files have either 4 or 5 digits as their filenames & all of them
    are in one directory)


======== key search strings / text: extracted from the attachment =========

Subject: Delivery Status Notification (Failure):
. . . . .

Delivery to the following recipients failed.

       xia0h3i-@hotmail.com

.......


Subject: Delivery Status Notification (Failure)
. . . . .
Delivery to the following recipients failed.

       tec_kid@hotmail.com
..........

Subject: Mail delivery failed: returning message to sender
. . . . .
recipients. This is a permanent error. The following address(es) failed:

  life.breathe@hotmail.com

..........


j.txt
0
Comment
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 2000 total points
ID: 35096683
This should do what you want.  To call the script:

script.pl file1 file2 file3 ... > output_file
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b.*\s+failed\b/) {
            $in_err++;
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                print "$1; $subj\n";
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

Open in new window

0
 

Author Comment

by:sunhux
ID: 35096691

3 corrections / requirements to what I posted above :

1) remove "failed" from the search string as I came across one line
    below which does not have the string "failed" in it :
   "The following addresses had permanent delivery errors"


2) Came across one example below where search string
     "he following" & the email addr to be extracted are on
     the same line : if this 2nd point can't be achieved by the
     same script, feel free to write a separate script

3) After extraction, I would like to sort the output by the 2nd
    column of the output as primary key & 1st column as the
    secondary sort key (remember it's a ;/semicolon separated
    columns
Subject: Delivery Status Notification (Failure)
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status; boundary="onJ2O.4dPFG2WYz.1+z937.67NgUs8"
. . . . .

The following message to <kayn73@starhub.net.sg> was undeliverable.
The reason for the problem:
......
0
 

Author Comment

by:sunhux
ID: 35096726

Wow that's fast Wilcoxon.  However, I've made some corrections to my
earlier requirements, sorry about that, hope you can amend the script.


Lastly, can I run the script by passing the * wildcard character to represent
all files in that directory,  ie :
    script.pl * > output_file
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 31

Expert Comment

by:farzanj
ID: 35096729
I looked through your example.  There is no "he following"  &  "failed".  I searched the attached file.  Could you give a file that contains the terms you need.

Second, your algorithm is a little confusing.  A little simpler would be appreciated.
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 2000 total points
ID: 35096794
To take into account your changes....

This modified version should handle everything...
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err, %bad);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b/) {
            if (/\b(\w\S+@[\w\.]+)\b/) {
                $bad{$subj}{$1}++;
            } else {
                $in_err++;
            }
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                $bad{$subj}{$1}++;
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

foreach my $subj (sort keys %bad) {
    foreach my $addr (sort keys %{$bad{$subj}}) {
        print "$addr; $subj\n";
    }
}

Open in new window

0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 2000 total points
ID: 35096807
Yes, you can run it as script.pl * > output_file.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35096820
farzanj, not sure why you're not finding it - "he following" and "failure" occur 3 times in j.txt (as expected since there are 3 failures in the log).
0
 

Author Comment

by:sunhux
ID: 35096842


The attachment is the actual file & to re-illustrate , I repost those
lines with the search strings (those underlined by ^^^) below :

Delivery to the following recipients failed
                   ^^^^^^^^^^^^                

recipients. This is a permanent error. The following address(es) failed:
                                                              ^^^^^^^^^^^^

The following message to <kayn73@starhub.net.sg> was undeliverable.
  ^^^^^^^^^^^


"he" is a substring of both "The" & "the" (but since it can be sometimes capital
T & sometimes small t,  I indicated the search string as "he following" )
0
 

Author Comment

by:sunhux
ID: 35096856

I'll test it out tomorrow : it's now 1am my time
0
 

Author Closing Comment

by:sunhux
ID: 35108046
Marvellous
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question