Solved

Perl script required to split concatenated emails and rename the files

Posted on 2003-11-06
15
237 Views
Last Modified: 2007-12-19
I am currently using this small perl script to help split a concatenated file full of emails into multiple files - at present named 1 through n.

#!/usr/bin/perl

$i=0;
while (<> ) {
  if(/^Subject:/) {
    close(FILE);
    $i++;
    open(FILE, ">$i");
    select FILE;
  }
  print;
}

Example:

I have a file named 191102ve.txz, which contains many emails all with a similar subject. Let's say it has 9 emails and the above script produces files 1 through 9. A grep of Subject would produce

1:Subject: AA/067869 Invoice
2:Subject: BB/068549 Invoice
3:Subject: CC/068616 Invoice
4:Subject: DD/070432 Invoice
5:Subject: EE/071172 Invoice
6:Subject: FF/072634 Invoice
7:Subject: AA/072658 Invoice
8:Subject: CC/073205 Invoice
9:Subject: DD/075095 Invoice

Easy enough, with minimal load on system compared to shell scripts. The naming scheme is far too simple. Wondering if I could enhance the naming of the files produces to be:

AA.067869.191102 instead of 1
BB.068549.191102 instead of 2
CC.068616.191102 instead of 3
<and so forth>

Where the XX.?????? Comes straight from Subject heading (which was XX/?????? Invoice) And the 191102 comes from the input file (which was 191102ve.txz).

Is this an easy task? Would anyone have any possible suggestion?
If this is easy, a better naming solution would be:

AA.067869.19NOV instead of 1
BB.068549.19NOV instead of 2
CC.068616.19NOV instead of 3

Thanks in advance,

Glenn Stewart
0
Comment
Question by:glennstewart
  • 7
  • 3
  • 2
15 Comments
 
LVL 5

Expert Comment

by:fantasy1001
ID: 9699474
Something like this:

$filename = "191102ve.txz"
$filename =~ /^(\d+)/; $extract = $1;
open FILEIN, $filename or die "Error:$!";
   @ary = <FILEIN>;
close FILEIN;

foreach (@ary){
   s/(^Subject: )| Invoice//;
   s/\//\./;
   open FILEOUT, ">$_.$extract" or print "Error";
   # manipulate here
   close FILEOUT;
}

~ fantasy ~

0
 
LVL 1

Assisted Solution

by:jt401
jt401 earned 30 total points
ID: 9699517
this should give you the first part of the filename you want.. assuming the subject line is always in the same format - if not, you can adjust the regex accordingly. As for getting the original file name in there. To get the second part, you'll need to do a regex match on the original filename.. wherever that is in the script. Post that portion of the code if you need help with that.  As for converting 11 to NOV, it would probably be easiest to just create an array of month names in your script. You could use the Date::Calc module also, but it's not really necessary.

if(/^Subject: (\w{2}\/\d{6}) Invoice/) {
    $outputFileName = $1;
    $outputFileName =~ tr/\//./;
    close(FILE);
    open(FILE, ">$outputFileName");
    select FILE;
}
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9699547
Thanks for additions so far....
To help with the perfect solution I should provide you all with the full input:

Although I mentioned that I have the file 191102ve.txz, I will actually have maybe up to 1000 files in format DDMMYY??.txz where in this example, ?? is ve.
Currently I am running the above script (the first one above). I would like a solution that
1.  reads all DDMMYY??.txz files
2. splits each one into the various emails with Subject: AA/123456 Invoice
3. Outputs each email into AA.123456.DDMON (with DD = Day of file split, MON = Month of file split)

I'll reward points accordingly - with possible increase.

Regards,

Glenn Stewart
0
 
LVL 1

Expert Comment

by:jt401
ID: 9699556
you beat me to it fantasy...  sorry about that, I should have refreshed before posting!
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9699558
Hmm... this is the first time I have used this forum.
I accepted an answer but require addition responses
0
 
LVL 5

Accepted Solution

by:
fantasy1001 earned 50 total points
ID: 9699616
# I assume you have all the txz file in the directory where your pl situated
open PIPE, "ls *.txz | ";
   @files = <PIPE>;
close PIPE;

foreach (@files){
   chomp;
   /(^\d\d\d\d)/; $extract = $1;
   open FILEIN, $_ or die "Error:$!";
      @ary = <FILEIN>;
   close FILEIN;

   foreach (@ary){
      s/(^Subject: )| Invoice//;
      s/\//\./;
      open FILEOUT, ">$_.$extract" or print "Error";
      # manipulate here
      close FILEOUT;
   }
}
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 1

Author Comment

by:glennstewart
ID: 9699809
For extra points see another post I have created.

Thanks
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9700191
Hi AnnieMod,

This was my very first post.
I apologise for not reading the help.

I was giving this a C with the assumption of
1. I could allocate part points
2. I would get more solutions

My intention was to reward along the way for submissions. I have seen quite quickly that this isn't the way I should be allocating points.

The original post I rewarded was on the way to a solution, but not quite there yet.

Kindest Regards,

Glenn
0
 
LVL 1

Expert Comment

by:jt401
ID: 9701404
I still think that deserved better than a C.. everything from your original question was covered. I can see giving less points, but not a low grade.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9702995
Sorry jt401. Was unaware of the consequences.
I am used to the HP tech forum, where a first attempt gets awarded a C.. then next a B and a final, an A.
Reading the guidelines now for this forum I completely agree with you, but my hands are tied.

My intentions were good. I was hoping to award points for all and didn't realise the grade was so important. I thought it was a way of dividing points. Incorrect in this assumption too.

As I said.... my first post. And I learned quickly.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9703019
Btw.... this is why I opened a new thread prior to AnnieMod mentioning anything.
I was hoping to provide solvers of this thread a second chance to earn the points and grade.

Apologies if I have offended anyone.  - but it shows that someone can't simply jump into this forum without reading instructions.
To be honest, the points/grade system although a great idea, was far from easy to grasp for a first timer.
0
 
LVL 1

Author Comment

by:glennstewart
ID: 9703071
Oh. okay... thanks.
Could you reopen the grade/points because I would like to reward accordingly.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now