Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 258
  • Last Modified:

Perl script required to split concatenated emails and rename the files

I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Thanks in advance,

Glenn Stewart
0
glennstewart
Asked:
glennstewart
  • 7
  • 3
  • 2
2 Solutions
 
fantasy1001Commented:
Something like this:

$filename = "191102ve.txz"
$filename =~ /^(\d+)/; $extract = $1;
open FILEIN, $filename or die "Error:$!";
   @ary = <FILEIN>;
close FILEIN;

foreach (@ary){
   s/(^Subject: )| Invoice//;
   s/\//\./;
   open FILEOUT, ">$_.$extract" or print "Error";
   # manipulate here
   close FILEOUT;
}

~ fantasy ~

0
 
jt401Commented:
this should give you the first part of the filename you want.. assuming the subject line is always in the same format - if not, you can adjust the regex accordingly. As for getting the original file name in there. To get the second part, you'll need to do a regex match on the original filename.. wherever that is in the script. Post that portion of the code if you need help with that.  As for converting 11 to NOV, it would probably be easiest to just create an array of month names in your script. You could use the Date::Calc module also, but it's not really necessary.

if(/^Subject: (\w{2}\/\d{6}) Invoice/) {
    $outputFileName = $1;
    $outputFileName =~ tr/\//./;
    close(FILE);
    open(FILE, ">$outputFileName");
    select FILE;
}
0
 
glennstewartAuthor Commented:
Thanks for additions so far....
To help with the perfect solution I should provide you all with the full input:

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.

Regards,

Glenn Stewart
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
jt401Commented:
you beat me to it fantasy...  sorry about that, I should have refreshed before posting!
0
 
glennstewartAuthor Commented:
Hmm... this is the first time I have used this forum.
I accepted an answer but require addition responses
0
 
fantasy1001Commented:
# I assume you have all the txz file in the directory where your pl situated
open PIPE, "ls *.txz | ";
   @files = <PIPE>;
close PIPE;

foreach (@files){
   chomp;
   /(^\d\d\d\d)/; $extract = $1;
   open FILEIN, $_ or die "Error:$!";
      @ary = <FILEIN>;
   close FILEIN;

   foreach (@ary){
      s/(^Subject: )| Invoice//;
      s/\//\./;
      open FILEOUT, ">$_.$extract" or print "Error";
      # manipulate here
      close FILEOUT;
   }
}
0
 
glennstewartAuthor Commented:
For extra points see another post I have created.

Thanks
0
 
glennstewartAuthor Commented:
Hi AnnieMod,

This was my very first post.
I apologise for not reading the help.

I was giving this a C with the assumption of
1. I could allocate part points
2. I would get more solutions

My intention was to reward along the way for submissions. I have seen quite quickly that this isn't the way I should be allocating points.

The original post I rewarded was on the way to a solution, but not quite there yet.

Kindest Regards,

Glenn
0
 
jt401Commented:
I still think that deserved better than a C.. everything from your original question was covered. I can see giving less points, but not a low grade.
0
 
glennstewartAuthor Commented:
Sorry jt401. Was unaware of the consequences.
I am used to the HP tech forum, where a first attempt gets awarded a C.. then next a B and a final, an A.
Reading the guidelines now for this forum I completely agree with you, but my hands are tied.

My intentions were good. I was hoping to award points for all and didn't realise the grade was so important. I thought it was a way of dividing points. Incorrect in this assumption too.

As I said.... my first post. And I learned quickly.
0
 
glennstewartAuthor Commented:
Btw.... this is why I opened a new thread prior to AnnieMod mentioning anything.
I was hoping to provide solvers of this thread a second chance to earn the points and grade.

Apologies if I have offended anyone.  - but it shows that someone can't simply jump into this forum without reading instructions.
To be honest, the points/grade system although a great idea, was far from easy to grasp for a first timer.
0
 
glennstewartAuthor Commented:
Oh. okay... thanks.
Could you reopen the grade/points because I would like to reward accordingly.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 7
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now