Solved

awk in UNIX to split a file based on the header records. TRU64 IBM UX

Posted on 2012-03-28
5
803 Views
Last Modified: 2012-06-27
The file has multiple header records in it.  I need to split the HDR and the records up to the next HDR into a separate file.  For example if there are three records which have HDR850 in the first six positions then I would split the file into three separate files.  

Basically the record structure looks like:
HDR850
DET0001
DET0002
DET0003
HDR850
DET0001
DET0002
DET0003
HDR850
DET0001
DET0002
DET0003
DET0004

I currently have an awk script which splits the file up on record count.  Maybe we could just modify this a little:

Calling the awk:

awk -f split.awk /fdmdev/edi-exp/MEIJR00.846

split.awk commands:

BEGIN {
x = 0
y = 1
     }
{
filename = "/fdmdev/edi-exp/catalog/MEIJR00."y".846"
if ( x == 0 ) header = $0
if ( x == 0 ) {
 x++
 next
 }
if ( x == 1 ) printf("%s\n",header) > filename
if ( x++ < 8999 ) printf("%s\n",$0) >> filename
else {
printf("%s\n",$0) >> filename
y++
x = 1
}
}
0
Comment
Question by:eshapley
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 35

Expert Comment

by:johnsone
ID: 37779162
This should do it

awk 'BEGIN { i=0 } /^HDR/ {++i} { print > (i)".out"}' <filename>

Open in new window


You will get files call 1.out, 2.out, 3.out, etc.
0
 

Author Comment

by:eshapley
ID: 37782671
This looks like something I can use. I discovered while developing this solution that I need to change the requirements a little.  I actually need to split to a new file each time this key value changes, beginning at position 1 of the same records:
HDR850                              VPD        C20731589.

For example:

HDR850                              VPD        C20731589
DET0001
DET0002
DET0003
HDR850                              VPD        C20731590
DET0001
DET0002
DET0003
HDR850                              VPD        C20731591
DET0001
DET0002
DET0003
0
 

Author Comment

by:eshapley
ID: 37782839
Error with suggested line:
/usr/local/cron/edi>awk 'BEGIN { i=0 } /^HDR/ {++i} { print > "ProEDI.DSH."(i)" }' /fdmdev/edi-imp/ProEDI.DSH
 Syntax Error The source line is 1.
 The error context is
                BEGIN { i=0 } /^HDR/ {++i} { print > "ProEDI.DSH."(i)" >>>  } <<< 
 awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.
        awk: 0602-540 There is a missing } character.
0
 
LVL 35

Accepted Solution

by:
johnsone earned 500 total points
ID: 37782948
You have only 3 quotes.  They need to be matched.  Probably more like this:

awk 'BEGIN { i=0 } /^HDR/ {++i} { print > "ProEDI.DSH."(i) }' /fdmdev/edi-imp/ProEDI.DSH

Open in new window

0
 

Author Comment

by:eshapley
ID: 37784061
Great.  I still need help with the revised HDR record criteria.  Can you suggest how to split on that?

For Example:

If /^HDR/ evaluate expression from position 47 through 57.  
     If expression in 47 through 57 is unchanged when compared with last HDR, keep record with the last HDR850 and DET grouping.
     If expression in 47 through 57 changes from last HDR split to a new file.

For example:
(file1)
HDR850                              VPD        C20731589
DET0001
DET0002
DET0003
HDR850                              VPD        C20731589
DET0001
DET0002
DET0003
(file2)
HDR850                              VPD        C20731590
DET0001
DET0002
DET0003
(file3)
HDR850                              VPD        C20731591
DET0001
DET0002
DET0003
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Shell script errors 10 143
Java core in Solaris 10 1 310
LINUX ZIP - UNCOMPRESS WINDOWS PATH 3 109
mobaxterm not able to change directory 28 280
When you do backups in the Solaris Operating System, the file system must be inactive. Otherwise, the output may be inconsistent. A file system is inactive when it's unmounted or it's write-locked by the operating system. Although the fssnap utility…
I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question