Solved

Perl or Shell script to sieve thru thousands of mail files & extract out addresses & failure reason from failed mails only

Posted on 2011-03-10
10
434 Views
Last Modified: 2012-05-11
I have thousands of mail files & in these files, there are
some mails which fail to be sent & some which are Ok
(ie get to the destination).

The attached is one of the thousands of mail files & below are
lines of text which we'll need a Perl or Shell script to search for,
extract out & output into a file in the format below :

xia0h3i-@hotmail.com;Subject: Delivery Status Notification (Failure):
tec_kid@hotmail.com;Subject: Delivery Status Notification (Failure)
life.breathe@hotmail.com;Subject: Mail delivery failed: returning message to sender


Briefly, the search algorithm is as follows :

a)search each file for a line containing both the search strings
   "he following"  &  "failed" & 2 lines below it is the problem email
   address - extract out this email address & output to a file in first
   column followed by ; (semi-colon as column separator)

b)then search a few lines (it's variable number of lines) backwards
   for the string "Subject:"  & extract out this line & add it into the 2nd
   column of the file

c)then repeat step (a) above to search forward for the next line with the
   search strings "he following" & "failed" for subsequent extractions till
   the end of the file & then proceed to do the same for the next mail file
   (all mail files have either 4 or 5 digits as their filenames & all of them
    are in one directory)


======== key search strings / text: extracted from the attachment =========

Subject: Delivery Status Notification (Failure):
. . . . .

Delivery to the following recipients failed.

       xia0h3i-@hotmail.com

.......


Subject: Delivery Status Notification (Failure)
. . . . .
Delivery to the following recipients failed.

       tec_kid@hotmail.com
..........

Subject: Mail delivery failed: returning message to sender
. . . . .
recipients. This is a permanent error. The following address(es) failed:

  life.breathe@hotmail.com

..........


j.txt
0
Comment
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 500 total points
ID: 35096683
This should do what you want.  To call the script:

script.pl file1 file2 file3 ... > output_file
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b.*\s+failed\b/) {
            $in_err++;
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                print "$1; $subj\n";
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

Open in new window

0
 

Author Comment

by:sunhux
ID: 35096691

3 corrections / requirements to what I posted above :

1) remove "failed" from the search string as I came across one line
    below which does not have the string "failed" in it :
   "The following addresses had permanent delivery errors"


2) Came across one example below where search string
     "he following" & the email addr to be extracted are on
     the same line : if this 2nd point can't be achieved by the
     same script, feel free to write a separate script

3) After extraction, I would like to sort the output by the 2nd
    column of the output as primary key & 1st column as the
    secondary sort key (remember it's a ;/semicolon separated
    columns
Subject: Delivery Status Notification (Failure)
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status; boundary="onJ2O.4dPFG2WYz.1+z937.67NgUs8"
. . . . .

The following message to <kayn73@starhub.net.sg> was undeliverable.
The reason for the problem:
......
0
 

Author Comment

by:sunhux
ID: 35096726

Wow that's fast Wilcoxon.  However, I've made some corrections to my
earlier requirements, sorry about that, hope you can amend the script.


Lastly, can I run the script by passing the * wildcard character to represent
all files in that directory,  ie :
    script.pl * > output_file
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 31

Expert Comment

by:farzanj
ID: 35096729
I looked through your example.  There is no "he following"  &  "failed".  I searched the attached file.  Could you give a file that contains the terms you need.

Second, your algorithm is a little confusing.  A little simpler would be appreciated.
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 35096794
To take into account your changes....

This modified version should handle everything...
#!/usr/bin/perl

use strict;
use warnings;

my ($subj, $in_err, %bad);
foreach my $fil (@ARGV) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        if (/^Subject:/) {
            $subj = $_;
        } elsif (/he\s+following\b/) {
            if (/\b(\w\S+@[\w\.]+)\b/) {
                $bad{$subj}{$1}++;
            } else {
                $in_err++;
            }
        } elsif ($in_err) {
            if (/^\s*(\S+@\S+)\s*$/) {
                $bad{$subj}{$1}++;
            } elsif (not /^\s*$/) {
                $in_err = 0;
            }
        }
    }
    close IN;
}

foreach my $subj (sort keys %bad) {
    foreach my $addr (sort keys %{$bad{$subj}}) {
        print "$addr; $subj\n";
    }
}

Open in new window

0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 500 total points
ID: 35096807
Yes, you can run it as script.pl * > output_file.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35096820
farzanj, not sure why you're not finding it - "he following" and "failure" occur 3 times in j.txt (as expected since there are 3 failures in the log).
0
 

Author Comment

by:sunhux
ID: 35096842


The attachment is the actual file & to re-illustrate , I repost those
lines with the search strings (those underlined by ^^^) below :

Delivery to the following recipients failed
                   ^^^^^^^^^^^^                

recipients. This is a permanent error. The following address(es) failed:
                                                              ^^^^^^^^^^^^

The following message to <kayn73@starhub.net.sg> was undeliverable.
  ^^^^^^^^^^^


"he" is a substring of both "The" & "the" (but since it can be sometimes capital
T & sometimes small t,  I indicated the search string as "he following" )
0
 

Author Comment

by:sunhux
ID: 35096856

I'll test it out tomorrow : it's now 1am my time
0
 

Author Closing Comment

by:sunhux
ID: 35108046
Marvellous
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question