Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Perl script required to split concatenated emails and rename the files

Posted on 2003-11-06
15
Medium Priority
?
257 Views
Last Modified: 2007-12-19
I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Thanks in advance,

Glenn Stewart
0
Comment
Question by:glennstewart
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 3
  • 2
15 Comments
 
LVL 5

Expert Comment

by:fantasy1001
ID: 9699474
Something like this:

$filename = "191102ve.txz"
$filename =~ /^(\d+)/; $extract = $1;
open FILEIN, $filename or die "Error:$!";
   @ary = <FILEIN>;
close FILEIN;

foreach (@ary){
   s/(^Subject: )| Invoice//;
   s/\//\./;
   open FILEOUT, ">$_.$extract" or print "Error";
   # manipulate here
   close FILEOUT;
}

~ fantasy ~

0
 
LVL 1

Assisted Solution

by:jt401
jt401 earned 90 total points
ID: 9699517
this should give you the first part of the filename you want.. assuming the subject line is always in the same format - if not, you can adjust the regex accordingly. As for getting the original file name in there. To get the second part, you'll need to do a regex match on the original filename.. wherever that is in the script. Post that portion of the code if you need help with that.  As for converting 11 to NOV, it would probably be easiest to just create an array of month names in your script. You could use the Date::Calc module also, but it's not really necessary.

if(/^Subject: (\w{2}\/\d{6}) Invoice/) {
    $outputFileName = $1;
    $outputFileName =~ tr/\//./;
    close(FILE);
    open(FILE, ">$outputFileName");
    select FILE;
}
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9699547
Thanks for additions so far....
To help with the perfect solution I should provide you all with the full input:

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.

Regards,

Glenn Stewart
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 1

Expert Comment

by:jt401
ID: 9699556
you beat me to it fantasy...  sorry about that, I should have refreshed before posting!
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9699558
Hmm... this is the first time I have used this forum.
I accepted an answer but require addition responses
0
 
LVL 5

Accepted Solution

by:
fantasy1001 earned 150 total points
ID: 9699616
# I assume you have all the txz file in the directory where your pl situated
open PIPE, "ls *.txz | ";
   @files = <PIPE>;
close PIPE;

foreach (@files){
   chomp;
   /(^\d\d\d\d)/; $extract = $1;
   open FILEIN, $_ or die "Error:$!";
      @ary = <FILEIN>;
   close FILEIN;

   foreach (@ary){
      s/(^Subject: )| Invoice//;
      s/\//\./;
      open FILEOUT, ">$_.$extract" or print "Error";
      # manipulate here
      close FILEOUT;
   }
}
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9699809
For extra points see another post I have created.

Thanks
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9700191
Hi AnnieMod,

This was my very first post.
I apologise for not reading the help.

I was giving this a C with the assumption of
1. I could allocate part points
2. I would get more solutions

My intention was to reward along the way for submissions. I have seen quite quickly that this isn't the way I should be allocating points.

The original post I rewarded was on the way to a solution, but not quite there yet.

Kindest Regards,

Glenn
0
 
LVL 1

Expert Comment

by:jt401
ID: 9701404
I still think that deserved better than a C.. everything from your original question was covered. I can see giving less points, but not a low grade.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9702995
Sorry jt401. Was unaware of the consequences.
I am used to the HP tech forum, where a first attempt gets awarded a C.. then next a B and a final, an A.
Reading the guidelines now for this forum I completely agree with you, but my hands are tied.

My intentions were good. I was hoping to award points for all and didn't realise the grade was so important. I thought it was a way of dividing points. Incorrect in this assumption too.

As I said.... my first post. And I learned quickly.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9703019
Btw.... this is why I opened a new thread prior to AnnieMod mentioning anything.
I was hoping to provide solvers of this thread a second chance to earn the points and grade.

Apologies if I have offended anyone.  - but it shows that someone can't simply jump into this forum without reading instructions.
To be honest, the points/grade system although a great idea, was far from easy to grasp for a first timer.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9703071
Oh. okay... thanks.
Could you reopen the grade/points because I would like to reward accordingly.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question