Perl solution required for spliting of emails and renaming files

I've already posted this but unfortuantely awarded points too early....

I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.
A complete solution will get the points.

Glenn Stewart
LVL 1
glennstewartAsked:
Who is Participating?
 
jmcgConnect With a Mentor OwnerCommented:
Something like this?

#! /usr/bin/env perl

use strict;

my @monthmap = qw( 0 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);

while (<> ) {
 if(/^Subject: (\w\w)\/(\d+)/) {
   my $f = "$1.$2."; # new filename starts with components of invoice
   $ARGV =~ /^(\d{2})(\d{2})/; # extract day, month from current input filename
   $f = $f . $1 . $monthmap[$2];
   open(STDOUT, ">$f") or die "open failed for $f -- $!"; # old STDOUT closed automatically
   }
 print;
}

0
 
glennstewartAuthor Commented:
Btw - it would be nice if you could explain the code. I am a complete Perl novice and would like to tweak.
I am normally a shell scripter but can't find a solution that doesn't corrupt the pdf attached emails.
0
 
jmcgOwnerCommented:
Let me know if you need more explanation.

I've asked a moderator to take a look at your earlier question. You may get a chance to revise the grade you handed out there by mistake.

Meanwhile, it would be good to visit Experts-Exchange's Help pages to familiarize yourself with how to use the site. Be sure to check the section on closing questions.

http://www.experts-exchange.com/help/

Meanwhile, don't be hasty about closing this one. Give it a day or two. Someone may well come along who will give you a more pleasing solution or clearer explanation than I am able to.

Oh, and one more thing — does it matter that the Subject: line might not be the first line in the mail message's header?
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
glennstewartAuthor Commented:
Hi jmcg,

Thanks very much for your help.
I have tried your script out on the client site. It works like a charm.

The software we develop writes the headers in the first place, so we know that the Subject is the first line.
The only thing I can't guarantee is the existance of a file already. Given 10's of 1000's of AA.123456.XXMON files will be written, I can't guarantee that one won't overwrite another.

If I was going to ask to an addition to your script above, it would be to put a .0 or .1 or .2 on the end if the previous exists.

E.g.

If the file CH.095813.02NOV exists, and another needs to be produced, it is called CH.095813.02NOV.0
If the file CH.095813.02NOV and CH.095813.02NOV.0 exists, and another needs to be produced, it is called CH.095813.02NOV.1
.. and so on.


Thanks heaps - I will reward after I get a few more solutions.
Btw - after I split points, is the problem solved?

Glenn
0
 
jmcgConnect With a Mentor OwnerCommented:
In my naive vision of the world, invoice numbering is unique. In the real world, you could do something to prevent overwriting like the following:

(this would be inserted before the 'open' statement)

if( -e $f ) { my $jot = 0; while( -e "$f.$jot") { ++$jot}; $f = "$f.$jot"; }

This could get to be tediously inefficient if duplicate invoice number/date combinations are common.

The -e operator tests for file existence. There are two tests, since there are two different forms for the filename: bare, and with a disambiguating suffix.

======

Yes, once you've decided on a point split, the question is closed, just as if you had accepted one of the answers. You can't (at least, not without intervention by a moderator) go back and add more points and award them to new answers that may show up later.

Don't be too hasty to close questions. I'd recommend giving them at least 24 hours, so more experts can see the question and consider whether to contribute. But do try to close them after a few days.
0
 
glennstewartAuthor Commented:
Hi jmcg,

Excellent solution. Exactly what I wanted.
Good work

Glenn
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.