Unix Shell or Perl script to facilitate splitting a file based on a footer separator


I currently have to use editor to manually split a file & would like a script to faciliate
the splitting such I can just run :
./splitting_script  Inputfile


There's a separator that tells us where to split the Inputfile.
A file with 3 separators will be split into 3 files,
a file with n separators will be split into n files.
Eg of an Inputfile:
record1 .....
record2 .....
...............
recordX
<YYYYMMDDhhmm_abcd>      <== this is a separator
recordX+1
........
recordY
<YYYYMMDDhhmm_abcd>      <== this is another separator
recordY+1
........
recordZ
<YYYYMMDDhhmm_abcd>      <== this is the last separator

where YYYYMMDD is the numeric date, hhmm is hour_minute
while abcd is a variable number (can be a 3 or 4 or 5 digit number).

Since the date, time & variable number are non-constant,
  the  <......>  is the separator to look for.


So in the above example, the InputFile would be split into the 3 files below :
File1:
====
record1 .....
record2 .....
...............
recordX
<YYYYMMDDhhmm_abcd>

File2:
====
recordX+1
........
recordY
<YYYYMMDDhhmm_abcd>

File3:
====
recordY+1
........
recordZ
<YYYYMMDDhhmm_abcd>


sunhuxAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

zlobchoCommented:
Try this:
#!/usr/bin/perl
use strict;
my $file=$ARGV[0];
open (DATA,"$file") or die "can not open the inputfile\n";
my $number=1;
while(<DATA>){
   open (FILE,">>file$number.txt") or die "can not create a new file\n";
   if ($_!~/\<\d{12}_\d{3,5}\>/){
    print FILE $_;
   }
   if ($_=~/\<\d{12}_\d{3,5}\>/){
       print FILE $_;
       close (FILE);
       $number++;
   }
}
close (DATA);

Open in new window

0
sunhuxAuthor Commented:
Thanks, anyone has an equivalent Shell script, in case Perl is not present in CentOS Linux
0
sunhuxAuthor Commented:

Hi Zlobcho,

Perl interpreter is present in our CentOS as /usr/local/bin/perl
so should the first line of the Perl script be :
#!/usr/local/bin/perl


Will need help to put in 2 enhancements to your Perl script :

1) if the inputfile contains only one  "<......>", then don't split it but
    echo a message "The inputfile has only 1 day's data, no splitting needed".
    Loosely, in Shell script, my code would be
        separator_count=`grep "<" inputfile | grep ">" | wc -l `
        if [ $separator_count < 2 ]
        then
           echo "The inputfile has only 1 day's data, no splitting needed".
        else
           .... split the file as per your Perl script ...
        fi

2) I've tested the Perl script & if there's  N separators, it produces "N+1"  split files
    with the last  (ie the N+1) file being a file containing a line with either <CR> or
    <LF> or <EOF> character.  I think this is due to the fact that my inputfile's last
    line has these character(s).  Can you enhance your script NOT to produce the
    (N+1) file.  Perhaps put check for the last line/record of the input file : if it has
    less than 3 characters, then it should not be processed & output to a file
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

zlobchoCommented:
Try this here:
#!/usr/bin/perl
use strict;

my $file=$ARGV[0];
my $number=1;
my $separator_count=0;
my $rec=();
open (DATA,"$file") or die "can not open the inputfile\n";
while(<DATA>){
 if ($_=~/\<\d{12}_\d{3,5}\>/){ $separator_count++; }
 }
close (DATA);

if ($separator_count < 2){
print "The inputfile has only 1 day's data, no splitting needed\n";}

if ($separator_count > 1)  {
 open (DATA,"$file") or die "can not open the inputfile\n";
   while(<DATA>){
     if ($_=~/^.*/.../^\<\d{12}_\d{3,5}\>$/) { $rec.=$_; }
     if ($_=~/\<\d{12}_\d{3,5}\>/) {
       open (FILE,">>test/file$number.txt") or die "can not create a new file\n";
         print FILE $rec;
       close (FILE);
       $number++;
       undef $rec;
      }
   }
}
close (DATA);

Open in new window

0
zlobchoCommented:
open (FILE,">>test/file$number.txt") or die "can not create a new file\n";

or better

open (FILE,">>file$number.txt") or die "can not create a new file\n";
#!/usr/bin/perl
use strict;

my $file=$ARGV[0];
my $number=1;
my $separator_count=0;
my $rec=();
open (DATA,"$file") or die "can not open the inputfile\n";
while(<DATA>){
 if ($_=~/\<\d{12}_\d{3,5}\>/){ $separator_count++; }
 }
close (DATA);

if ($separator_count < 2){
print "The inputfile has only 1 day's data, no splitting needed\n";}

if ($separator_count > 1)  {
 open (DATA,"$file") or die "can not open the inputfile\n";
   while(<DATA>){
     if ($_=~/^.*/.../^\<\d{12}_\d{3,5}\>$/) { $rec.=$_; }
     if ($_=~/\<\d{12}_\d{3,5}\>/) {
       open (FILE,">>file$number.txt") or die "can not create a new file\n";
         print FILE $rec;
       close (FILE);
       $number++;
       undef $rec;
      }
   }
}
close (DATA);

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sunhuxAuthor Commented:

Excellent,  the script tested ok.

Thanks vm zlobcho,
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.