Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Best way to redirect data to new files depending on character position value of large data flat file

Posted on 2009-07-13
12
Medium Priority
?
537 Views
Last Modified: 2013-12-26
Below is a modified copy of my flat data file.  Character position 22 is a numerical
value (3rd column on first line, 4 column farther down) and each line with a
same number has same data structure.  I have sorted this file
(cat xxx.dat | sort -k1.22,1.23 > sorted.txt) by character 22,23 and redirected it to a new file.
I now need to grep or use some type of command or process to seperate this data in
multiple files, depending on this number value (1-9) in character position 22,23.  
I would like to use .ksh to do it all (AIX) at which point I will then use SQLLoader
to load data into Oracle.

I am looking for best way to create my multiple files depending on this specific
number.  Any direction appreciated.  Thanks!!!

ABCD  000000202354D1 1   050809    12050605101209061354
ABCD  000000202354D1 2   050809    H06201W00245180    
ABCD  000000202354D1 5   050809    120406    1204060060
ABCD  000000202354D1 5   050809    120806    1208060060
ABCD  000000202354D1 5   050809    120806    1208060120
ABCD  000000202354D1 5   050809    122706    1227062400
ABCD  000000202354D1 5   050809    012307    0123072400
ABCD  000000202354D1 5   050809    032207    0322072410
ABCD  000000202354D1 7   050809    00000{AHR09999409999
ABCD  000000202354D1 8   050809    20061204DOCTOR NUMBE
ABCD  000000202354D0 9   050809    PAPTORFC  20061204 0
ABCD  000001984906 1 1   050809    10300312230217040000
ABCD  000001984906 1 2   050809    H07801461394656    
ABCD  000001984906 1 5   050809    103003    103003009
ABCD  000001984906 1 5   050809    112503    112503241
ABCD  000001984906 1 7   050809    00000{ HR          
ABCD  000001984906 1 8   050809    20031030DOCTOR NUMB
ABCD  000001984906 1 8   050809    20031103MED RECORD
ABCD  000001984906 1 8   050809    20040131MESSAGE PTR
ABCD  000001984906 0 9   050809    PAGUBDTE  20031031
ABCD  000002023924D1 1   050809    1001072300100507111
0
Comment
Question by:diannagibbs
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
12 Comments
 
LVL 58

Expert Comment

by:amit_g
ID: 24845378
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' < YourFileName
0
 
LVL 3

Expert Comment

by:glenthorne
ID: 24845391
Once the data is sorted as mentioned, you can pipe the data through the following awk script.  You might need to adjust the v=substr($0,22,1) command to make sure you get the right characters out of the the flat file.
#!/bin/awk
BEGIN {
 FILE file;
 oldv=0;
}
 
{
  v=substr($0,22,1);
  if ( oldv != v ) {
    if ( oldv > 0 ) {
      close(file);
    }
    f=sprintf("file%s.dat",v);
    file=f;
    oldv=v;
  }
  print $0 >> file;
 
}

Open in new window

0
 
LVL 3

Expert Comment

by:glenthorne
ID: 24845417
For amit_g's command to work, you need to make sure there are two >> after the print $0

In both examples, make sure that the destination files do NOT exist as they simply append to the file with the double >>
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:diannagibbs
ID: 24845511
These are great comments - thanks so much!  I'll try it first thing tomorrow am.
0
 
LVL 58

Expert Comment

by:amit_g
ID: 24845566
The awk behaves differently from shell so first > would create/override the existing file while the subsequent > would append to the same file.

From: http://www.gnu.org/manual/gawk/gawk.html#Printing

print items > output-file
    This type of redirection prints the items into the output file named output-file. The file name output-file can be any expression. Its value is changed to a string and then used as a file name (see Expressions).

    When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes to the same output-file do not erase output-file, but append to it. (This is different from how you use redirections in shell scripts.) If output-file does not exist, it is created
0
 

Author Comment

by:diannagibbs
ID: 24850231
I cannot get this to work; my data has alot of odd characters, and I've been told AIX will not parse some of these characters:
 awk: 0602-502 The statement cannot be correctly parsed. The source line is 1.
 Syntax Error The source line is 1.
        awk: 0602-541 There are 1101779 missing } characters.
        awk: 0602-543 There are 31 extra ) characters.
./awk.ksh[4]: BEGIN:  not found.
./awk.ksh[5]: FILE:  not found.
./awk.ksh[7]: 0403-057 Syntax error at line 8 : `}' is not expected.
0
 
LVL 3

Expert Comment

by:glenthorne
ID: 24850308
It looks like ksh is trying to interpret/run the file instead of awk.

Also the script is expecting the file to be piped into it:

cat xxx.dat | sort -k1.22,1.23 | awk -f awk.ksh

I hope this helps.
0
 
LVL 58

Accepted Solution

by:
amit_g earned 2000 total points
ID: 24851457
Have you tried the direct command line?

cat xxx.dat | sort -k1.22,1.23 | awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}'

or if you already have the output saved, two steps

cat xxx.dat | sort -k1.22,1.23 > sorted.txt
awk '{n = substr($0, 22, 1); print $0 > "YourSplitFileNamePrefix" n;}' sorted.txt > NewFileName.txt
0
 

Author Comment

by:diannagibbs
ID: 24853297

My AIX sysadmin helped me - here's what we did, similar to amit_g above:
sort cqbbdf.dat  | awk '{ file=substr($0,22,2)".txt";print >> file  }'

0
 
LVL 58

Expert Comment

by:amit_g
ID: 24965702
Shouldn't you accept http:#24851457. It does exactly what you asked and the solution that you acknowledge as working is almost the same as one proposed.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: The undo support, implementing a stack. Continuing from the eigth article about sudoku.   We need a mechanism to keep track of the digits entered so as to implement an undo mechanism.  This should be a ‘Last In First Out’ collec…
In this post we will learn different types of Android Layout and some basics of an Android App.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question