Link to home
Start Free TrialLog in
Avatar of eshapley
eshapley

asked on

Need to cut a file that is 30,000 records long into multiple files with not more than 9000 records in each

I need to take a large file and break it down into smaller files.  I need to repeat the first line of the original file as a first line in each file that is created.  The reason is that the first line is a HEADER record.  All of the other lines in the file are DETAIL lines.  The original  file looks something like this:

HEADER, HDR1, HDR2, HDR3
DETAIL, FIELD1, FIELD2, FIELD3
DETAIL, FIELD1, FIELD2, FIELD3
DETAIL, FIELD1, FIELD2, FIELD3
DETAIL, FIELD1, FIELD2, FIELD3
DETAIL, FIELD1, FIELD2, FIELD3
DETAIL, FIELD1, FIELD2, FIELD3

After the script runs I would have multiple files in the same format.
Avatar of ozo
ozo
Flag of United States of America image

split -l 8999 largefile splitname
perl -MTie::File  -e  '$h=<>;shift; tie @a, "Tie::File",$_ and unshift @a,$h for @ARGV' splitname*
Avatar of eshapley
eshapley

ASKER

I am looking for a solution using UNIX TRU64.  Thanks for looking at this.  The suggestion ozo made looks like a perl command.  We don't run perl on this server.
#!/bin/bash
split -l 8999 largefile splitname
head=`head -1 largefile`
for p in split* ;  do sed -i.bak "1s/^/$head\\
/" $p ; done
cp splitnameaa.bak splitnameaa
save the snip below as split.awk

then run "awk -f split.awk datafile.txt"

datafile.txt is of course your file

the output will be files named data.NUMBER.txt  of 9000 total records including header
BEGIN {
x = 0
y = 1
     }
{
if ( x == 0 ) header = $0
if ( x == 0 ) {
 x++
 next
 }
if ( x == 1 ) printf("%s\n",header) > "data."y".txt"
if ( x++ < 8999 ) printf("%s\n",$0) >> "data."y".txt"
else {
printf("%s\n",$0) >> "data."y".txt"
y++
x = 1
}
}

Open in new window

I am getting an error with the -i.  Is this a subcommand in sed?

/usr/local/cron/edi>split-cat
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
sed: illegal option -- i
Usage: sed [-n] script [file...]
       sed [-n] {-e script}...[-f script_file]...[file...]
cp: splitnameaa.bak: No such file or directory
Hi Mikelfritz,

Here is the error I get using your solution:

/usr/local/cron/edi>awk -f split.awk catalog.txt
 syntax error The source line is 11.
 The error context is
                if ( x == 1 ) printf("%s\n",HEADR) > >>>  data.t <<< 
 awk: The statement cannot be correctly parsed.
 The source line is 11.
 syntax error The source line is 12.

I should say that I was not correct about the word "HEADER" in the first line, and tried to adjust for that.  The first line first word is "HEADR", and includes the quotes.
"header" is a variable in the script - it does not matter what the data is in the first line (or any other line) - header is set in line 6.  Try putting line 11 back to use "header" or change line 6 to match...

try running it as I wrote it.  I ran it (with gawk on redhat) and it worked...
The line 6 code is to capture the first line which is your Header and save it to a variable for further use. It then goes through as many iterations as needed by printing the header line and then 8999 lines of the data to a file named "data.SOME_NUMBER.txt"

.
Hi Mikelfritz,

Same error:
/usr/local/cron/edi>awk -f split.awk catalog.txt
 syntax error The source line is 11.
 The error context is
                if ( x == 1 ) printf("%s\n",header) > >>>  "data."y" <<< 
 awk: The statement cannot be correctly parsed.
 The source line is 11.
 syntax error The source line is 12.
Are you sure you have the syntax exactly as typed?  I just copied that code from here and pasted into a "split.awk" file on an AIX server and took your example from above and pasted it into a file "test.txt" and it worked perfectly.  The quotes and ">" and ">>" are very important.  y is also a variable, so it must not be inside quotes.

Line 11 and 12:

if ( x == 1 ) printf("%s\n",header) > "data."y".txt"
if ( x++ < 8999 ) printf("%s\n",$0) >> "data."y".txt"

You could try to make lines 11 and 12:

if ( x == 1 ) printf("%s\n",header)
if ( x++ < 8999 ) printf("%s\n",$0)


That would not dump the data to file but to your screen.  

Keep me posted
ASKER CERTIFIED SOLUTION
Avatar of mikelfritz
mikelfritz
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Mikelfritz,

That solutiion appears to work ok.  Let me play around with it a bit.  Thanks much, and stand by.
#!/bin/sh
split -l 8999 largefile splitname
head="`head -1 largefile`"
for p in split* ;  do (echo "$head"; cat $p)>$p.new; done

## but I guess that a single awk solution is much faster ..