Solved

Perl solution required for spliting of emails and renaming files

Posted on 2003-11-07
6
220 Views
Last Modified: 2010-03-04
I've already posted this but unfortuantely awarded points too early....

I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.
A complete solution will get the points.

Glenn Stewart
0
Comment
Question by:glennstewart
  • 3
  • 3
6 Comments
 
LVL 1

Author Comment

by:glennstewart
Comment Utility
Btw - it would be nice if you could explain the code. I am a complete Perl novice and would like to tweak.
I am normally a shell scripter but can't find a solution that doesn't corrupt the pdf attached emails.
0
 
LVL 20

Accepted Solution

by:
jmcg earned 235 total points
Comment Utility
Something like this?

#! /usr/bin/env perl

use strict;

my @monthmap = qw( 0 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);

while (<> ) {
 if(/^Subject: (\w\w)\/(\d+)/) {
   my $f = "$1.$2."; # new filename starts with components of invoice
   $ARGV =~ /^(\d{2})(\d{2})/; # extract day, month from current input filename
   $f = $f . $1 . $monthmap[$2];
   open(STDOUT, ">$f") or die "open failed for $f -- $!"; # old STDOUT closed automatically
   }
 print;
}

0
 
LVL 20

Expert Comment

by:jmcg
Comment Utility
Let me know if you need more explanation.

I've asked a moderator to take a look at your earlier question. You may get a chance to revise the grade you handed out there by mistake.

Meanwhile, it would be good to visit Experts-Exchange's Help pages to familiarize yourself with how to use the site. Be sure to check the section on closing questions.

http://www.experts-exchange.com/help/

Meanwhile, don't be hasty about closing this one. Give it a day or two. Someone may well come along who will give you a more pleasing solution or clearer explanation than I am able to.

Oh, and one more thing — does it matter that the Subject: line might not be the first line in the mail message's header?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 1

Author Comment

by:glennstewart
Comment Utility
Hi jmcg,

Thanks very much for your help.
I have tried your script out on the client site. It works like a charm.

The software we develop writes the headers in the first place, so we know that the Subject is the first line.
The only thing I can't guarantee is the existance of a file already. Given 10's of 1000's of AA.123456.XXMON files will be written, I can't guarantee that one won't overwrite another.

If I was going to ask to an addition to your script above, it would be to put a .0 or .1 or .2 on the end if the previous exists.

E.g.

If the file CH.095813.02NOV exists, and another needs to be produced, it is called CH.095813.02NOV.0
If the file CH.095813.02NOV and CH.095813.02NOV.0 exists, and another needs to be produced, it is called CH.095813.02NOV.1
.. and so on.


Thanks heaps - I will reward after I get a few more solutions.
Btw - after I split points, is the problem solved?

Glenn
0
 
LVL 20

Assisted Solution

by:jmcg
jmcg earned 235 total points
Comment Utility
In my naive vision of the world, invoice numbering is unique. In the real world, you could do something to prevent overwriting like the following:

(this would be inserted before the 'open' statement)

if( -e $f ) { my $jot = 0; while( -e "$f.$jot") { ++$jot}; $f = "$f.$jot"; }

This could get to be tediously inefficient if duplicate invoice number/date combinations are common.

The -e operator tests for file existence. There are two tests, since there are two different forms for the filename: bare, and with a disambiguating suffix.

======

Yes, once you've decided on a point split, the question is closed, just as if you had accepted one of the answers. You can't (at least, not without intervention by a moderator) go back and add more points and award them to new answers that may show up later.

Don't be too hasty to close questions. I'd recommend giving them at least 24 hours, so more experts can see the question and consider whether to contribute. But do try to close them after a few days.
0
 
LVL 1

Author Comment

by:glennstewart
Comment Utility
Hi jmcg,

Excellent solution. Exactly what I wanted.
Good work

Glenn
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now