Go Premium for a chance to win a PS4. Enter to Win


Perl solution required for spliting of emails and renaming files

Posted on 2003-11-07
Medium Priority
Last Modified: 2010-03-04
I've already posted this but unfortuantely awarded points too early....

I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.


while (<> ) {
  if(/^Subject:/) {
    open(FILE, ">$i");
    select FILE;


I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.
A complete solution will get the points.

Glenn Stewart
Question by:glennstewart
  • 3
  • 3

Author Comment

ID: 9699576
Btw - it would be nice if you could explain the code. I am a complete Perl novice and would like to tweak.
I am normally a shell scripter but can't find a solution that doesn't corrupt the pdf attached emails.
LVL 20

Accepted Solution

jmcg earned 940 total points
ID: 9699820
Something like this?

#! /usr/bin/env perl

use strict;


while (<> ) {
 if(/^Subject: (\w\w)\/(\d+)/) {
   my $f = "$1.$2."; # new filename starts with components of invoice
   $ARGV =~ /^(\d{2})(\d{2})/; # extract day, month from current input filename
   $f = $f . $1 . $monthmap[$2];
   open(STDOUT, ">$f") or die "open failed for $f -- $!"; # old STDOUT closed automatically

LVL 20

Expert Comment

ID: 9699855
Let me know if you need more explanation.

I've asked a moderator to take a look at your earlier question. You may get a chance to revise the grade you handed out there by mistake.

Meanwhile, it would be good to visit Experts-Exchange's Help pages to familiarize yourself with how to use the site. Be sure to check the section on closing questions.


Meanwhile, don't be hasty about closing this one. Give it a day or two. Someone may well come along who will give you a more pleasing solution or clearer explanation than I am able to.

Oh, and one more thing — does it matter that the Subject: line might not be the first line in the mail message's header?
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 9700169
Hi jmcg,

Thanks very much for your help.
I have tried your script out on the client site. It works like a charm.

The software we develop writes the headers in the first place, so we know that the Subject is the first line.
The only thing I can't guarantee is the existance of a file already. Given 10's of 1000's of AA.123456.XXMON files will be written, I can't guarantee that one won't overwrite another.

If I was going to ask to an addition to your script above, it would be to put a .0 or .1 or .2 on the end if the previous exists.


If the file CH.095813.02NOV exists, and another needs to be produced, it is called CH.095813.02NOV.0
If the file CH.095813.02NOV and CH.095813.02NOV.0 exists, and another needs to be produced, it is called CH.095813.02NOV.1
.. and so on.

Thanks heaps - I will reward after I get a few more solutions.
Btw - after I split points, is the problem solved?

LVL 20

Assisted Solution

jmcg earned 940 total points
ID: 9705717
In my naive vision of the world, invoice numbering is unique. In the real world, you could do something to prevent overwriting like the following:

(this would be inserted before the 'open' statement)

if( -e $f ) { my $jot = 0; while( -e "$f.$jot") { ++$jot}; $f = "$f.$jot"; }

This could get to be tediously inefficient if duplicate invoice number/date combinations are common.

The -e operator tests for file existence. There are two tests, since there are two different forms for the filename: bare, and with a disambiguating suffix.


Yes, once you've decided on a point split, the question is closed, just as if you had accepted one of the answers. You can't (at least, not without intervention by a moderator) go back and add more points and award them to new answers that may show up later.

Don't be too hasty to close questions. I'd recommend giving them at least 24 hours, so more experts can see the question and consider whether to contribute. But do try to close them after a few days.

Author Comment

ID: 9711442
Hi jmcg,

Excellent solution. Exactly what I wanted.
Good work


Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

877 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question