Best way to redirect data to new files depending on character position value of large data flat file

Posted on 2009-07-13
Last Modified: 2013-12-26
Below is a modified copy of my flat data file.  Character position 22 is a numerical
value (3rd column on first line, 4 column farther down) and each line with a
same number has same data structure.  I have sorted this file
(cat xxx.dat | sort -k1.22,1.23 > sorted.txt) by character 22,23 and redirected it to a new file.
I now need to grep or use some type of command or process to seperate this data in
multiple files, depending on this number value (1-9) in character position 22,23.  
I would like to use .ksh to do it all (AIX) at which point I will then use SQLLoader
to load data into Oracle.

I am looking for best way to create my multiple files depending on this specific
number.  Any direction appreciated.  Thanks!!!

ABCD  000000202354D1 1   050809    12050605101209061354
ABCD  000000202354D1 2   050809    H06201W00245180    
ABCD  000000202354D1 5   050809    120406    1204060060
ABCD  000000202354D1 5   050809    120806    1208060060
ABCD  000000202354D1 5   050809    120806    1208060120
ABCD  000000202354D1 5   050809    122706    1227062400
ABCD  000000202354D1 5   050809    012307    0123072400
ABCD  000000202354D1 5   050809    032207    0322072410
ABCD  000000202354D1 7   050809    00000{AHR09999409999
ABCD  000000202354D1 8   050809    20061204DOCTOR NUMBE
ABCD  000000202354D0 9   050809    PAPTORFC  20061204 0
ABCD  000001984906 1 1   050809    10300312230217040000
ABCD  000001984906 1 2   050809    H07801461394656    
ABCD  000001984906 1 5   050809    103003    103003009
ABCD  000001984906 1 5   050809    112503    112503241
ABCD  000001984906 1 7   050809    00000{ HR          
ABCD  000001984906 1 8   050809    20031030DOCTOR NUMB
ABCD  000001984906 1 8   050809    20031103MED RECORD
ABCD  000001984906 1 8   050809    20040131MESSAGE PTR
ABCD  000001984906 0 9   050809    PAGUBDTE  20031031
ABCD  000002023924D1 1   050809    1001072300100507111
Question by:diannagibbs
  • 4
  • 3
  • 3
LVL 58

Expert Comment

ID: 24845378
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' < YourFileName

Expert Comment

ID: 24845391
Once the data is sorted as mentioned, you can pipe the data through the following awk script.  You might need to adjust the v=substr($0,22,1) command to make sure you get the right characters out of the the flat file.
 FILE file;
  if ( oldv != v ) {
    if ( oldv > 0 ) {
  print $0 >> file;

Open in new window


Expert Comment

ID: 24845417
For amit_g's command to work, you need to make sure there are two >> after the print $0

In both examples, make sure that the destination files do NOT exist as they simply append to the file with the double >>
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.


Author Comment

ID: 24845511
These are great comments - thanks so much!  I'll try it first thing tomorrow am.
LVL 58

Expert Comment

ID: 24845566
The awk behaves differently from shell so first > would create/override the existing file while the subsequent > would append to the same file.


print items > output-file
    This type of redirection prints the items into the output file named output-file. The file name output-file can be any expression. Its value is changed to a string and then used as a file name (see Expressions).

    When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes to the same output-file do not erase output-file, but append to it. (This is different from how you use redirections in shell scripts.) If output-file does not exist, it is created

Author Comment

ID: 24850231
I cannot get this to work; my data has alot of odd characters, and I've been told AIX will not parse some of these characters:
 awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.
 Syntax Error The source line is 1.
        awk: 0602-541 There are 1101779 missing } characters.
        awk: 0602-543 There are 31 extra ) characters.
./awk.ksh[4]: BEGIN:  not found.
./awk.ksh[5]: FILE:  not found.
./awk.ksh[7]: 0403-057 Syntax error at line 8 : `}' is not expected.

Expert Comment

ID: 24850308
It looks like ksh is trying to interpret/run the file instead of awk.

Also the script is expecting the file to be piped into it:

cat xxx.dat | sort -k1.22,1.23 | awk -f awk.ksh

I hope this helps.
LVL 58

Accepted Solution

amit_g earned 500 total points
ID: 24851457
Have you tried the direct command line?

cat xxx.dat | sort -k1.22,1.23 | awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}'

or if you already have the output saved, two steps

cat xxx.dat | sort -k1.22,1.23 > sorted.txt
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' sorted.txt > NewFileName.txt

Author Comment

ID: 24853297

My AIX sysadmin helped me - here's what we did, similar to amit_g above:
sort cqbbdf.dat  | awk '{ file=substr($0,22,2)".txt";print >> file  }'

LVL 58

Expert Comment

ID: 24965702
Shouldn't you accept http:#24851457. It does exactly what you asked and the solution that you acknowledge as working is almost the same as one proposed.

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Separate into files by filename 12 82
how to monitor remote shell execution on linux 9 100
wordcount challenge 11 121
Problem to App source 6 39
Introduction: Hints for the grid button.  Nested classes, templated collections.  Squash that darned bug! Continuing from the sixth article about sudoku.   Open the project in visual studio. First we will finish with the SUD_SETVALUE messa…
Introduction: Database storage, where is the exe actually on the disc? Playing a game selected randomly (how to generate random numbers).  Error trapping with try..catch to help the code run even if something goes wrong. Continuing from the seve…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit If you want to manage em…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question