Perl solution required for spliting of emails and renaming files

I've already posted this but unfortuantely awarded points too early....

I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.
A complete solution will get the points.

Glenn Stewart
LVL 1
glennstewartAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

glennstewartAuthor Commented:
Btw - it would be nice if you could explain the code. I am a complete Perl novice and would like to tweak.
I am normally a shell scripter but can't find a solution that doesn't corrupt the pdf attached emails.
0
jmcgOwnerCommented:
Something like this?

#! /usr/bin/env perl

use strict;

my @monthmap = qw( 0 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);

while (<> ) {
 if(/^Subject: (\w\w)\/(\d+)/) {
   my $f = "$1.$2."; # new filename starts with components of invoice
   $ARGV =~ /^(\d{2})(\d{2})/; # extract day, month from current input filename
   $f = $f . $1 . $monthmap[$2];
   open(STDOUT, ">$f") or die "open failed for $f -- $!"; # old STDOUT closed automatically
   }
 print;
}

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jmcgOwnerCommented:
Let me know if you need more explanation.

I've asked a moderator to take a look at your earlier question. You may get a chance to revise the grade you handed out there by mistake.

Meanwhile, it would be good to visit Experts-Exchange's Help pages to familiarize yourself with how to use the site. Be sure to check the section on closing questions.

http://www.experts-exchange.com/help/

Meanwhile, don't be hasty about closing this one. Give it a day or two. Someone may well come along who will give you a more pleasing solution or clearer explanation than I am able to.

Oh, and one more thing — does it matter that the Subject: line might not be the first line in the mail message's header?
0
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

glennstewartAuthor Commented:
Hi jmcg,

Thanks very much for your help.
I have tried your script out on the client site. It works like a charm.

The software we develop writes the headers in the first place, so we know that the Subject is the first line.
The only thing I can't guarantee is the existance of a file already. Given 10's of 1000's of AA.123456.XXMON files will be written, I can't guarantee that one won't overwrite another.

If I was going to ask to an addition to your script above, it would be to put a .0 or .1 or .2 on the end if the previous exists.

E.g.

If the file CH.095813.02NOV exists, and another needs to be produced, it is called CH.095813.02NOV.0
If the file CH.095813.02NOV and CH.095813.02NOV.0 exists, and another needs to be produced, it is called CH.095813.02NOV.1
.. and so on.


Thanks heaps - I will reward after I get a few more solutions.
Btw - after I split points, is the problem solved?

Glenn
0
jmcgOwnerCommented:
In my naive vision of the world, invoice numbering is unique. In the real world, you could do something to prevent overwriting like the following:

(this would be inserted before the 'open' statement)

if( -e $f ) { my $jot = 0; while( -e "$f.$jot") { ++$jot}; $f = "$f.$jot"; }

This could get to be tediously inefficient if duplicate invoice number/date combinations are common.

The -e operator tests for file existence. There are two tests, since there are two different forms for the filename: bare, and with a disambiguating suffix.

======

Yes, once you've decided on a point split, the question is closed, just as if you had accepted one of the answers. You can't (at least, not without intervention by a moderator) go back and add more points and award them to new answers that may show up later.

Don't be too hasty to close questions. I'd recommend giving them at least 24 hours, so more experts can see the question and consider whether to contribute. But do try to close them after a few days.
0
glennstewartAuthor Commented:
Hi jmcg,

Excellent solution. Exactly what I wanted.
Good work

Glenn
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.