Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Perl solution required for spliting of emails and renaming files

Posted on 2003-11-07
6
Medium Priority
?
229 Views
Last Modified: 2010-03-04
I've already posted this but unfortuantely awarded points too early....

I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.
A complete solution will get the points.

Glenn Stewart
0
Comment
Question by:glennstewart
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 1

Author Comment

by:glennstewart
ID: 9699576
Btw - it would be nice if you could explain the code. I am a complete Perl novice and would like to tweak.
I am normally a shell scripter but can't find a solution that doesn't corrupt the pdf attached emails.
0
 
LVL 20

Accepted Solution

by:
jmcg earned 940 total points
ID: 9699820
Something like this?

#! /usr/bin/env perl

use strict;

my @monthmap = qw( 0 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);

while (<> ) {
 if(/^Subject: (\w\w)\/(\d+)/) {
   my $f = "$1.$2."; # new filename starts with components of invoice
   $ARGV =~ /^(\d{2})(\d{2})/; # extract day, month from current input filename
   $f = $f . $1 . $monthmap[$2];
   open(STDOUT, ">$f") or die "open failed for $f -- $!"; # old STDOUT closed automatically
   }
 print;
}

0
 
LVL 20

Expert Comment

by:jmcg
ID: 9699855
Let me know if you need more explanation.

I've asked a moderator to take a look at your earlier question. You may get a chance to revise the grade you handed out there by mistake.

Meanwhile, it would be good to visit Experts-Exchange's Help pages to familiarize yourself with how to use the site. Be sure to check the section on closing questions.

http://www.experts-exchange.com/help/

Meanwhile, don't be hasty about closing this one. Give it a day or two. Someone may well come along who will give you a more pleasing solution or clearer explanation than I am able to.

Oh, and one more thing — does it matter that the Subject: line might not be the first line in the mail message's header?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:glennstewart
ID: 9700169
Hi jmcg,

Thanks very much for your help.
I have tried your script out on the client site. It works like a charm.

The software we develop writes the headers in the first place, so we know that the Subject is the first line.
The only thing I can't guarantee is the existance of a file already. Given 10's of 1000's of AA.123456.XXMON files will be written, I can't guarantee that one won't overwrite another.

If I was going to ask to an addition to your script above, it would be to put a .0 or .1 or .2 on the end if the previous exists.

E.g.

If the file CH.095813.02NOV exists, and another needs to be produced, it is called CH.095813.02NOV.0
If the file CH.095813.02NOV and CH.095813.02NOV.0 exists, and another needs to be produced, it is called CH.095813.02NOV.1
.. and so on.


Thanks heaps - I will reward after I get a few more solutions.
Btw - after I split points, is the problem solved?

Glenn
0
 
LVL 20

Assisted Solution

by:jmcg
jmcg earned 940 total points
ID: 9705717
In my naive vision of the world, invoice numbering is unique. In the real world, you could do something to prevent overwriting like the following:

(this would be inserted before the 'open' statement)

if( -e $f ) { my $jot = 0; while( -e "$f.$jot") { ++$jot}; $f = "$f.$jot"; }

This could get to be tediously inefficient if duplicate invoice number/date combinations are common.

The -e operator tests for file existence. There are two tests, since there are two different forms for the filename: bare, and with a disambiguating suffix.

======

Yes, once you've decided on a point split, the question is closed, just as if you had accepted one of the answers. You can't (at least, not without intervention by a moderator) go back and add more points and award them to new answers that may show up later.

Don't be too hasty to close questions. I'd recommend giving them at least 24 hours, so more experts can see the question and consider whether to contribute. But do try to close them after a few days.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9711442
Hi jmcg,

Excellent solution. Exactly what I wanted.
Good work

Glenn
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question