PMGreensted
asked on
Split a file in UNIX into seperate files based on a delimiter character.
Hi Experts,
How would you split a file (input.txt) into seperate files (output.txtn) based on the delimiter $ only.
input.txt -
{headertext}{moreheadertex t}{id:some content;}{ 11{22}{33} }${headert ext2}{more headertext }{id:somec ontent2;}{ 11{22}{33} }${headert ext3}{more headertext }{id:somec ontent3;}{ 11{22}{33} }
output.txt1 -
{headertext}{moreheadertex t}{id:some content;}{ 11{22}{33} }
output.txt2 -
{headertext2}{moreheaderte xt}{id:som econtent2; }{11{22}{3 3}}
output.txt3 -
{headertext3}{moreheaderte xt}{id:som econtent3; }{11{22}{3 3}}
How would you split a file (input.txt) into seperate files (output.txtn) based on the delimiter $ only.
input.txt -
{headertext}{moreheadertex
output.txt1 -
{headertext}{moreheadertex
output.txt2 -
{headertext2}{moreheaderte
output.txt3 -
{headertext3}{moreheaderte
ASKER
Thanks, that works.
Just one variation - If the source file has line breaks there is an output file created for each line as well as each side of the $ delimiter.
To get round this I could remove all line breaks first like this:
tr -d \\n <input.txt> input2.txt
awk 'BEGIN{FS="$"}
{
for(i=1;i<=NF;i++){
print $i > "output-"i"-"NR".txt"
}
}
' "input2.txt"
But I would rather have the line breaks if possible. I could replace the line breaks in the first place with:
tr \\n '~~~~~' <input.txt> input2.txt
then put them all back again for each output file:
tr '~~~~~' \\n <output-1-1.txt> output-1-1.txt2
but this seems a bit long-winded. Is there an easier way?
Just one variation - If the source file has line breaks there is an output file created for each line as well as each side of the $ delimiter.
To get round this I could remove all line breaks first like this:
tr -d \\n <input.txt> input2.txt
awk 'BEGIN{FS="$"}
{
for(i=1;i<=NF;i++){
print $i > "output-"i"-"NR".txt"
}
}
' "input2.txt"
But I would rather have the line breaks if possible. I could replace the line breaks in the first place with:
tr \\n '~~~~~' <input.txt> input2.txt
then put them all back again for each output file:
tr '~~~~~' \\n <output-1-1.txt> output-1-1.txt2
but this seems a bit long-winded. Is there an easier way?
perl -044pe 'chomp;open STDOUT,">output.txt$."'
perl -044l12pe 'open STDOUT,">output.txt$."'
ASKER
Hi ozo,
Sorry I'm not too familiar with perl. What does perl -044l12pe 'open STDOUT,">output.txt$."' do?
Sorry I'm not too familiar with perl. What does perl -044l12pe 'open STDOUT,">output.txt$."' do?
It splits files into "lines" ending in '$' = '\044' and writes each line to separate output.txtn after removing the trailing $ and replacing it with '\n' = '\012'
actually i am not sure what do you mean.. please post a sample of that input.txt again, including the line breaks , so i could see what's going on. Then also state what is expected to see.as output.
ASKER
OK, well let's say:
input.txt -
{headertext}{moreheadertex t}{id:
somecontent
;}{11{22}{33}}${headertext 2}{morehea dertext}{i d:
somecontent2
;}{11{22}{33}}${headertext 3}{morehea dertext}{i d:
somecontent3
;}{11{22}{33}}
I'm trying to get the output like:
output.txt1 -
{headertext}{moreheadertex t}{id:some content;}{ 11{22}{33} }
output.txt2 -
{headertext2}{moreheaderte xt}{id:som econtent2; }{11{22}{3 3}}
output.txt3 -
{headertext3}{moreheaderte xt}{id:som econtent3; }{11{22}{3 3}}
Your awk example worked for my single line input.txt but breaks up the input.txt into seperate output files for each line if input.txt is a multiline file.
Sorry, I know you got it right in the first place, but then realised my input.txt may sometimes have multiple lines.
input.txt -
{headertext}{moreheadertex
somecontent
;}{11{22}{33}}${headertext
somecontent2
;}{11{22}{33}}${headertext
somecontent3
;}{11{22}{33}}
I'm trying to get the output like:
output.txt1 -
{headertext}{moreheadertex
output.txt2 -
{headertext2}{moreheaderte
output.txt3 -
{headertext3}{moreheaderte
Your awk example worked for my single line input.txt but breaks up the input.txt into seperate output files for each line if input.txt is a multiline file.
Sorry, I know you got it right in the first place, but then realised my input.txt may sometimes have multiple lines.
So you want to strip newlines?
perl -044pe 's/[$\n]//g;open STDOUT,">output.txt$."' input.txt
perl -044pe 's/[$\n]//g;open STDOUT,">output.txt$."' input.txt
perl -044pe 's/[\$\n]//g;open STDOUT,">output.txt$."' input.txt
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks ghostdog74 and ozo,
I tested the following and it works fine:
for filename in *.txt
do
awk 'BEGIN{RS="$"}{ gsub(/[$]/,""); print > ('$filename'"-"NR)}' $filename
rm $filename
done
I tested the following and it works fine:
for filename in *.txt
do
awk 'BEGIN{RS="$"}{ gsub(/[$]/,""); print > ('$filename'"-"NR)}' $filename
rm $filename
done
Are you saying the earlier suggestions did not work?
ASKER
No I'm not. All the solutions worked to certain degree. With a few changes here and there they can give the same result. I was just showing you guys what I ended up with.
awk 'BEGIN{FS="$"}
{
for(i=1;i<=NF;i++){
print $i > "output-"i"-"NR".txt"
}
}
' "file"