Go Premium for a chance to win a PS4. Enter to Win


Best way to redirect data to new files depending on character position value of large data flat file

Posted on 2009-07-13
Medium Priority
Last Modified: 2013-12-26
Below is a modified copy of my flat data file.  Character position 22 is a numerical
value (3rd column on first line, 4 column farther down) and each line with a
same number has same data structure.  I have sorted this file
(cat xxx.dat | sort -k1.22,1.23 > sorted.txt) by character 22,23 and redirected it to a new file.
I now need to grep or use some type of command or process to seperate this data in
multiple files, depending on this number value (1-9) in character position 22,23.  
I would like to use .ksh to do it all (AIX) at which point I will then use SQLLoader
to load data into Oracle.

I am looking for best way to create my multiple files depending on this specific
number.  Any direction appreciated.  Thanks!!!

ABCD  000000202354D1 1   050809    12050605101209061354
ABCD  000000202354D1 2   050809    H06201W00245180    
ABCD  000000202354D1 5   050809    120406    1204060060
ABCD  000000202354D1 5   050809    120806    1208060060
ABCD  000000202354D1 5   050809    120806    1208060120
ABCD  000000202354D1 5   050809    122706    1227062400
ABCD  000000202354D1 5   050809    012307    0123072400
ABCD  000000202354D1 5   050809    032207    0322072410
ABCD  000000202354D1 7   050809    00000{AHR09999409999
ABCD  000000202354D1 8   050809    20061204DOCTOR NUMBE
ABCD  000000202354D0 9   050809    PAPTORFC  20061204 0
ABCD  000001984906 1 1   050809    10300312230217040000
ABCD  000001984906 1 2   050809    H07801461394656    
ABCD  000001984906 1 5   050809    103003    103003009
ABCD  000001984906 1 5   050809    112503    112503241
ABCD  000001984906 1 7   050809    00000{ HR          
ABCD  000001984906 1 8   050809    20031030DOCTOR NUMB
ABCD  000001984906 1 8   050809    20031103MED RECORD
ABCD  000001984906 1 8   050809    20040131MESSAGE PTR
ABCD  000001984906 0 9   050809    PAGUBDTE  20031031
ABCD  000002023924D1 1   050809    1001072300100507111
Question by:diannagibbs
  • 4
  • 3
  • 3
LVL 58

Expert Comment

ID: 24845378
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' < YourFileName

Expert Comment

ID: 24845391
Once the data is sorted as mentioned, you can pipe the data through the following awk script.  You might need to adjust the v=substr($0,22,1) command to make sure you get the right characters out of the the flat file.
 FILE file;
  if ( oldv != v ) {
    if ( oldv > 0 ) {
  print $0 >> file;

Open in new window


Expert Comment

ID: 24845417
For amit_g's command to work, you need to make sure there are two >> after the print $0

In both examples, make sure that the destination files do NOT exist as they simply append to the file with the double >>
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.


Author Comment

ID: 24845511
These are great comments - thanks so much!  I'll try it first thing tomorrow am.
LVL 58

Expert Comment

ID: 24845566
The awk behaves differently from shell so first > would create/override the existing file while the subsequent > would append to the same file.

From: http://www.gnu.org/manual/gawk/gawk.html#Printing

print items > output-file
    This type of redirection prints the items into the output file named output-file. The file name output-file can be any expression. Its value is changed to a string and then used as a file name (see Expressions).

    When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes to the same output-file do not erase output-file, but append to it. (This is different from how you use redirections in shell scripts.) If output-file does not exist, it is created

Author Comment

ID: 24850231
I cannot get this to work; my data has alot of odd characters, and I've been told AIX will not parse some of these characters:
 awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.
 Syntax Error The source line is 1.
        awk: 0602-541 There are 1101779 missing } characters.
        awk: 0602-543 There are 31 extra ) characters.
./awk.ksh[4]: BEGIN:  not found.
./awk.ksh[5]: FILE:  not found.
./awk.ksh[7]: 0403-057 Syntax error at line 8 : `}' is not expected.

Expert Comment

ID: 24850308
It looks like ksh is trying to interpret/run the file instead of awk.

Also the script is expecting the file to be piped into it:

cat xxx.dat | sort -k1.22,1.23 | awk -f awk.ksh

I hope this helps.
LVL 58

Accepted Solution

amit_g earned 2000 total points
ID: 24851457
Have you tried the direct command line?

cat xxx.dat | sort -k1.22,1.23 | awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}'

or if you already have the output saved, two steps

cat xxx.dat | sort -k1.22,1.23 > sorted.txt
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' sorted.txt > NewFileName.txt

Author Comment

ID: 24853297

My AIX sysadmin helped me - here's what we did, similar to amit_g above:
sort cqbbdf.dat  | awk '{ file=substr($0,22,2)".txt";print >> file  }'

LVL 58

Expert Comment

ID: 24965702
Shouldn't you accept http:#24851457. It does exactly what you asked and the solution that you acknowledge as working is almost the same as one proposed.

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: The undo support, implementing a stack. Continuing from the eigth article about sudoku.   We need a mechanism to keep track of the digits entered so as to implement an undo mechanism.  This should be a ‘Last In First Out’ collec…
In this post we will learn different types of Android Layout and some basics of an Android App.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
this video summaries big data hadoop online training demo (http://onlineitguru.com/big-data-hadoop-online-training-placement.html) , and covers basics in big data hadoop .

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question