• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 547
  • Last Modified:

Best way to redirect data to new files depending on character position value of large data flat file

Below is a modified copy of my flat data file.  Character position 22 is a numerical
value (3rd column on first line, 4 column farther down) and each line with a
same number has same data structure.  I have sorted this file
(cat xxx.dat | sort -k1.22,1.23 > sorted.txt) by character 22,23 and redirected it to a new file.
I now need to grep or use some type of command or process to seperate this data in
multiple files, depending on this number value (1-9) in character position 22,23.  
I would like to use .ksh to do it all (AIX) at which point I will then use SQLLoader
to load data into Oracle.

I am looking for best way to create my multiple files depending on this specific
number.  Any direction appreciated.  Thanks!!!

ABCD  000000202354D1 1   050809    12050605101209061354
ABCD  000000202354D1 2   050809    H06201W00245180    
ABCD  000000202354D1 5   050809    120406    1204060060
ABCD  000000202354D1 5   050809    120806    1208060060
ABCD  000000202354D1 5   050809    120806    1208060120
ABCD  000000202354D1 5   050809    122706    1227062400
ABCD  000000202354D1 5   050809    012307    0123072400
ABCD  000000202354D1 5   050809    032207    0322072410
ABCD  000000202354D1 7   050809    00000{AHR09999409999
ABCD  000000202354D1 8   050809    20061204DOCTOR NUMBE
ABCD  000000202354D0 9   050809    PAPTORFC  20061204 0
ABCD  000001984906 1 1   050809    10300312230217040000
ABCD  000001984906 1 2   050809    H07801461394656    
ABCD  000001984906 1 5   050809    103003    103003009
ABCD  000001984906 1 5   050809    112503    112503241
ABCD  000001984906 1 7   050809    00000{ HR          
ABCD  000001984906 1 8   050809    20031030DOCTOR NUMB
ABCD  000001984906 1 8   050809    20031103MED RECORD
ABCD  000001984906 1 8   050809    20040131MESSAGE PTR
ABCD  000001984906 0 9   050809    PAGUBDTE  20031031
ABCD  000002023924D1 1   050809    1001072300100507111
  • 4
  • 3
  • 3
1 Solution
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' < YourFileName
Once the data is sorted as mentioned, you can pipe the data through the following awk script.  You might need to adjust the v=substr($0,22,1) command to make sure you get the right characters out of the the flat file.
 FILE file;
  if ( oldv != v ) {
    if ( oldv > 0 ) {
  print $0 >> file;

Open in new window

For amit_g's command to work, you need to make sure there are two >> after the print $0

In both examples, make sure that the destination files do NOT exist as they simply append to the file with the double >>
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

diannagibbsAuthor Commented:
These are great comments - thanks so much!  I'll try it first thing tomorrow am.
The awk behaves differently from shell so first > would create/override the existing file while the subsequent > would append to the same file.

From: http://www.gnu.org/manual/gawk/gawk.html#Printing

print items > output-file
    This type of redirection prints the items into the output file named output-file. The file name output-file can be any expression. Its value is changed to a string and then used as a file name (see Expressions).

    When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes to the same output-file do not erase output-file, but append to it. (This is different from how you use redirections in shell scripts.) If output-file does not exist, it is created
diannagibbsAuthor Commented:
I cannot get this to work; my data has alot of odd characters, and I've been told AIX will not parse some of these characters:
 awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.
 Syntax Error The source line is 1.
        awk: 0602-541 There are 1101779 missing } characters.
        awk: 0602-543 There are 31 extra ) characters.
./awk.ksh[4]: BEGIN:  not found.
./awk.ksh[5]: FILE:  not found.
./awk.ksh[7]: 0403-057 Syntax error at line 8 : `}' is not expected.
It looks like ksh is trying to interpret/run the file instead of awk.

Also the script is expecting the file to be piped into it:

cat xxx.dat | sort -k1.22,1.23 | awk -f awk.ksh

I hope this helps.
Have you tried the direct command line?

cat xxx.dat | sort -k1.22,1.23 | awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}'

or if you already have the output saved, two steps

cat xxx.dat | sort -k1.22,1.23 > sorted.txt
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' sorted.txt > NewFileName.txt
diannagibbsAuthor Commented:

My AIX sysadmin helped me - here's what we did, similar to amit_g above:
sort cqbbdf.dat  | awk '{ file=substr($0,22,2)".txt";print >> file  }'

Shouldn't you accept http:#24851457. It does exactly what you asked and the solution that you acknowledge as working is almost the same as one proposed.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 4
  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now