knc26
asked on
Bash shell script - Read file and add deliminator.
I have a file (.dat) with fields that are not seperated by any deliminators, but by position and need to convert it to a csv file. I need a bash script to read the .dat file, find the fields based on positions and write to a new csv file with the deliminator.
For example, my dat file contains
11222333344444
99888777766666
and need to convert it to:
11,222,3333,44444
99,888,7777,66666
Thanks in advance.
For example, my dat file contains
11222333344444
99888777766666
and need to convert it to:
11,222,3333,44444
99,888,7777,66666
Thanks in advance.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I'm having a problem with the cut command. When there is a field with more than one byte of space, cut command only includes one space.
For example
do
F1=`echo $line | cut -c 1-2`
F2=`echo $line | cut -c 3-5`
F3=`echo $line | cut -c 6-9`
F4=`echo $line | cut -c 10-14`
echo "$F1, $F2, $F3, $F4" >> newfile
done
on: 112223 344444
gives me:
11,222,3 3,44444 (one blank) instead of 11,222,3 3,44444 (two blanks)
For example
do
F1=`echo $line | cut -c 1-2`
F2=`echo $line | cut -c 3-5`
F3=`echo $line | cut -c 6-9`
F4=`echo $line | cut -c 10-14`
echo "$F1, $F2, $F3, $F4" >> newfile
done
on: 112223 344444
gives me:
11,222,3 3,44444 (one blank) instead of 11,222,3 3,44444 (two blanks)
To use the cut solution, you need to ensure you have quotes around $line, ie:
#!/bin/sh
while read line
do
F1=`echo "$line" | cut -c 1-2`
F2=`echo "$line" | cut -c 3-5`
F3=`echo "$line" | cut -c 6-9`
F4=`echo "$line" | cut -c 10-14`
echo "$F1,$F2,$F3,$F4" >> newfile
done <file
Please note that the while loop and cut solution is extremely slow. The following sed solution is much shorter and many, many times quicker
sed "s/\(..\)\(...\)\(....\)\(.....\)/\1,\2,\3,\4/" file >newfile
Tintin: Check out the first response.
tdiops, I did see your suggestion. Note that the extended regexs aren't supported by all sed versions. My solution will work with any sed version.
ASKER
I think the cat command is removing the extra continous spaces. Prior to performing the cut, I'm echoing each line and can see that multiple spaces within each record is being reduced to only one space.
Is this a behavior with the cat command? I didn't see any cat option that would prevent this. Is there another command that i can use instead of cat?
Is this a behavior with the cat command? I didn't see any cat option that would prevent this. Is there another command that i can use instead of cat?
knc26. I've already given you the reason and a cut solution to deal with multiple spaces. Please re-read my second last post.
How big are the files you are processing? Please take note of my other post regarding how slow a while loop and cut is.
Here's a *very* graphic example as to how much quicker sed is. Using just a small sample file (10000 lines), the while loop/cut solution takes 70 seconds to run compared to 0.02 of a second for the sed solution.
That means the sed solution is 3500 times quicker!!!
$ cat script
#!/bin/sh
while read line
do
F1=`echo "$line" | cut -c 1-2`
F2=`echo "$line" | cut -c 3-5`
F3=`echo "$line" | cut -c 6-9`
F4=`echo "$line" | cut -c 10-14`
echo "$F1,$F2,$F3,$F4" >> newfile
done <file
$ wc -l file
10000 file
$ time ./script
real 1m10.338s
user 0m8.740s
sys 0m41.060s
$ time sed "s/\(..\)\(...\)\(....\)\( .....\)/\1 ,\2,\3,\4/ " file >newfile
real 0m0.022s
user 0m0.021s
sys 0m0.001s
That means the sed solution is 3500 times quicker!!!
$ cat script
#!/bin/sh
while read line
do
F1=`echo "$line" | cut -c 1-2`
F2=`echo "$line" | cut -c 3-5`
F3=`echo "$line" | cut -c 6-9`
F4=`echo "$line" | cut -c 10-14`
echo "$F1,$F2,$F3,$F4" >> newfile
done <file
$ wc -l file
10000 file
$ time ./script
real 1m10.338s
user 0m8.740s
sys 0m41.060s
$ time sed "s/\(..\)\(...\)\(....\)\(
real 0m0.022s
user 0m0.021s
sys 0m0.001s
Open in new window