Solved

Best way to redirect data to new files depending on character position value of large data flat file

Posted on 2009-07-13
12
524 Views
Last Modified: 2013-12-26
Below is a modified copy of my flat data file.  Character position 22 is a numerical
value (3rd column on first line, 4 column farther down) and each line with a
same number has same data structure.  I have sorted this file
(cat xxx.dat | sort -k1.22,1.23 > sorted.txt) by character 22,23 and redirected it to a new file.
I now need to grep or use some type of command or process to seperate this data in
multiple files, depending on this number value (1-9) in character position 22,23.  
I would like to use .ksh to do it all (AIX) at which point I will then use SQLLoader
to load data into Oracle.

I am looking for best way to create my multiple files depending on this specific
number.  Any direction appreciated.  Thanks!!!

ABCD  000000202354D1 1   050809    12050605101209061354
ABCD  000000202354D1 2   050809    H06201W00245180    
ABCD  000000202354D1 5   050809    120406    1204060060
ABCD  000000202354D1 5   050809    120806    1208060060
ABCD  000000202354D1 5   050809    120806    1208060120
ABCD  000000202354D1 5   050809    122706    1227062400
ABCD  000000202354D1 5   050809    012307    0123072400
ABCD  000000202354D1 5   050809    032207    0322072410
ABCD  000000202354D1 7   050809    00000{AHR09999409999
ABCD  000000202354D1 8   050809    20061204DOCTOR NUMBE
ABCD  000000202354D0 9   050809    PAPTORFC  20061204 0
ABCD  000001984906 1 1   050809    10300312230217040000
ABCD  000001984906 1 2   050809    H07801461394656    
ABCD  000001984906 1 5   050809    103003    103003009
ABCD  000001984906 1 5   050809    112503    112503241
ABCD  000001984906 1 7   050809    00000{ HR          
ABCD  000001984906 1 8   050809    20031030DOCTOR NUMB
ABCD  000001984906 1 8   050809    20031103MED RECORD
ABCD  000001984906 1 8   050809    20040131MESSAGE PTR
ABCD  000001984906 0 9   050809    PAGUBDTE  20031031
ABCD  000002023924D1 1   050809    1001072300100507111
0
Comment
Question by:diannagibbs
  • 4
  • 3
  • 3
12 Comments
 
LVL 58

Expert Comment

by:amit_g
ID: 24845378
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' < YourFileName
0
 
LVL 3

Expert Comment

by:glenthorne
ID: 24845391
Once the data is sorted as mentioned, you can pipe the data through the following awk script.  You might need to adjust the v=substr($0,22,1) command to make sure you get the right characters out of the the flat file.
#!/bin/awk

BEGIN {

 FILE file;

 oldv=0;

}
 

{

  v=substr($0,22,1);

  if ( oldv != v ) {

    if ( oldv > 0 ) {

      close(file);

    }

    f=sprintf("file%s.dat",v);

    file=f;

    oldv=v;

  }

  print $0 >> file;
 

}

Open in new window

0
 
LVL 3

Expert Comment

by:glenthorne
ID: 24845417
For amit_g's command to work, you need to make sure there are two >> after the print $0

In both examples, make sure that the destination files do NOT exist as they simply append to the file with the double >>
0
 

Author Comment

by:diannagibbs
ID: 24845511
These are great comments - thanks so much!  I'll try it first thing tomorrow am.
0
 
LVL 58

Expert Comment

by:amit_g
ID: 24845566
The awk behaves differently from shell so first > would create/override the existing file while the subsequent > would append to the same file.

From: http://www.gnu.org/manual/gawk/gawk.html#Printing

print items > output-file
    This type of redirection prints the items into the output file named output-file. The file name output-file can be any expression. Its value is changed to a string and then used as a file name (see Expressions).

    When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes to the same output-file do not erase output-file, but append to it. (This is different from how you use redirections in shell scripts.) If output-file does not exist, it is created
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 

Author Comment

by:diannagibbs
ID: 24850231
I cannot get this to work; my data has alot of odd characters, and I've been told AIX will not parse some of these characters:
 awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.
 Syntax Error The source line is 1.
        awk: 0602-541 There are 1101779 missing } characters.
        awk: 0602-543 There are 31 extra ) characters.
./awk.ksh[4]: BEGIN:  not found.
./awk.ksh[5]: FILE:  not found.
./awk.ksh[7]: 0403-057 Syntax error at line 8 : `}' is not expected.
0
 
LVL 3

Expert Comment

by:glenthorne
ID: 24850308
It looks like ksh is trying to interpret/run the file instead of awk.

Also the script is expecting the file to be piped into it:

cat xxx.dat | sort -k1.22,1.23 | awk -f awk.ksh

I hope this helps.
0
 
LVL 58

Accepted Solution

by:
amit_g earned 500 total points
ID: 24851457
Have you tried the direct command line?

cat xxx.dat | sort -k1.22,1.23 | awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}'

or if you already have the output saved, two steps

cat xxx.dat | sort -k1.22,1.23 > sorted.txt
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' sorted.txt > NewFileName.txt
0
 

Author Comment

by:diannagibbs
ID: 24853297

My AIX sysadmin helped me - here's what we did, similar to amit_g above:
sort cqbbdf.dat  | awk '{ file=substr($0,22,2)".txt";print >> file  }'

0
 
LVL 58

Expert Comment

by:amit_g
ID: 24965702
Shouldn't you accept http:#24851457. It does exactly what you asked and the solution that you acknowledge as working is almost the same as one proposed.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
configuration management 2 100
not able to insert into temp table 68 150
format the code in java 6 72
countEvens challenge 2 59
Introduction: Finishing the grid – keyboard support for arrow keys to manoeuvre, entering the numbers.  The PreTranslateMessage function is to be used to intercept and respond to keyboard events. Continuing from the fourth article about sudoku. …
Introduction: Dialogs (1) modal - maintaining the database. Continuing from the ninth article about sudoku.   You might have heard of modal and modeless dialogs.  Here with this Sudoku application will we use one of each type: a modal dialog …
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now