Solved

Split fixed length file into multiple files based on value in a position

Posted on 2014-03-10
8
751 Views
Last Modified: 2014-03-13
Hi
I want to split a fixed width file into multiple files based on a field which starts from position 3-8

file
kip56802
tim45607
skz56802
sam45607

  this has to be split into 2 files ,
file 1 would be
kip56802
skz56802

file 2 would be
tim45607
sam45607

thanks
0
Comment
Question by:uco
  • 4
  • 4
8 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39919316
Do you mean position 4-8, counting from 1?

Are there only two distinct values in that position? Are these values always "56802" or "45607"?
I don't believe so, but if the above is true then try this:

awk '{if(substr($1,4,5)~"56802") print >"file1"; if(substr($1,4,5)~"45607") print >"file2"}' file

We could also create output files named according to that "4-8" value, here e.g. "56802.out" and 45607.out":

awk '{print >substr($1,4,5)".out"}' file
0
 

Author Comment

by:uco
ID: 39919567
Yes that is correct I am counting from 1, there could be more than two values in that place and there could be  more than 2 distinct values
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39919914
So could my second solution above be an option?

awk '{print >substr($1,4,5)".out"}' file

Using the example you posted the above will create a file named "56802.out" containing

kip56802
skz56802

and a file "45607.out" containing

tim45607
sam45607

The more distinct strings in that position the more files will be created.

Basically we just take the string in positions 4-8 of each record plus a suffix ".out" to build output file names and write the appropriate records to the respective file.
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 

Author Comment

by:uco
ID: 39922469
I am having small issuse here i.e  some times the  length may be 5 but it may have spaces ,
it may have spaces like
kip568
skz568
how do I remove spaces in file name

thanks
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39922998
This replaces the spaces with underscores ("_"):

awk '{A=substr($0,4,5); gsub(" ","_",A); print >A".out"}' file

and this eliminates them:

awk '{A=substr($0,4,5); gsub(" ","",A); print >A".out"}' file
0
 

Author Comment

by:uco
ID: 39925461
Thanks a lot , it did work, can we print to a different directory ? I am trying to print to different directory but it is not allowing , it says
fatal: division by zero attempted

awk '{A=substr($0,4,5); gsub(" ","",A); print > dirc/A".out"}' file

also I want to write the file names created to a file

thanks
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 39925891
awk -v D="dirc/" -v S=".out" -v LS=".splitlist.txt" '
    {A=substr($0,4,5); gsub(" ","",A); F[D A S]+=1; print >D A S}
    END {N=FILENAME; sub(/^.*\//, "", N); for(f in F) print f, "\tRecords:", F[f] >D N LS}
' file

I used variables for the output file suffix and the directory path because we need them more than once.

The list containing the filenames and a record count per file is "dirc/file.splitlist.txt".
The suffix ".splitlist.txt" has its own variable for maintainability.
0
 

Author Closing Comment

by:uco
ID: 39926631
Simpy excellent
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question