asked on

Split a file in UNIX into seperate files based on a delimiter character.

Hi Experts,
How would you split a file (input.txt) into seperate files (output.txtn) based on the delimiter $ only.

input.txt -
{headertext}{moreheadertext}{id:somecontent;}{11{22}{33}}${headertext2}{moreheadertext}{id:somecontent2;}{11{22}{33}}${headertext3}{moreheadertext}{id:somecontent3;}{11{22}{33}}

output.txt1 -
{headertext}{moreheadertext}{id:somecontent;}{11{22}{33}}

output.txt2 -
{headertext2}{moreheadertext}{id:somecontent2;}{11{22}{33}}

output.txt3 -
{headertext3}{moreheadertext}{id:somecontent3;}{11{22}{33}}

ghostdog74

one way

awk 'BEGIN{FS="$"}
{
for(i=1;i<=NF;i++){
print $i > "output-"i"-"NR".txt"
}
}

' "file"

PMGreensted

ASKER

Thanks, that works.

Just one variation - If the source file has line breaks there is an output file created for each line as well as each side of the $ delimiter.

To get round this I could remove all line breaks first like this:

tr -d \\n <input.txt> input2.txt

awk 'BEGIN{FS="$"}
{
for(i=1;i<=NF;i++){
print $i > "output-"i"-"NR".txt"
}
}

' "input2.txt"

But I would rather have the line breaks if possible. I could replace the line breaks in the first place with:
tr \\n '~~~~~' <input.txt> input2.txt
then put them all back again for each output file:
tr '~~~~~' \\n <output-1-1.txt> output-1-1.txt2
but this seems a bit long-winded. Is there an easier way?

ozo

perl -044pe 'chomp;open STDOUT,">output.txt$."'

ozo

perl -044l12pe 'open STDOUT,">output.txt$."'

PMGreensted

ASKER

Hi ozo,
Sorry I'm not too familiar with perl. What does perl -044l12pe 'open STDOUT,">output.txt$."' do?

ozo

It splits files into "lines" ending in '$' = '\044' and writes each line to separate output.txtn after removing the trailing $ and replacing it with '\n' = '\012'

ghostdog74

actually i am not sure what do you mean.. please post a sample of that input.txt again, including the line breaks , so i could see what's going on. Then also state what is expected to see.as output.

PMGreensted

ASKER

OK, well let's say:

input.txt -
{headertext}{moreheadertext}{id:
somecontent
;}{11{22}{33}}${headertext2}{moreheadertext}{id:
somecontent2
;}{11{22}{33}}${headertext3}{moreheadertext}{id:
somecontent3
;}{11{22}{33}}

I'm trying to get the output like:

output.txt1 -
{headertext}{moreheadertext}{id:somecontent;}{11{22}{33}}

output.txt2 -
{headertext2}{moreheadertext}{id:somecontent2;}{11{22}{33}}

output.txt3 -
{headertext3}{moreheadertext}{id:somecontent3;}{11{22}{33}}

Your awk example worked for my single line input.txt but breaks up the input.txt into seperate output files for each line if input.txt is a multiline file.

Sorry, I know you got it right in the first place, but then realised my input.txt may sometimes have multiple lines.

ozo

So you want to strip newlines?

perl -044pe 's/[$\n]//g;open STDOUT,">output.txt$."' input.txt

ozo

perl -044pe 's/[\$\n]//g;open STDOUT,">output.txt$."' input.txt

ASKER CERTIFIED SOLUTION

ghostdog74

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

PMGreensted

ASKER

Thanks ghostdog74 and ozo,

I tested the following and it works fine:

for filename in *.txt
do
awk 'BEGIN{RS="$"}{ gsub(/[$]/,""); print > ('$filename'"-"NR)}' $filename
rm $filename
done

ozo

Are you saying the earlier suggestions did not work?

PMGreensted

ASKER

No I'm not. All the solutions worked to certain degree. With a few changes here and there they can give the same result. I was just showing you guys what I ended up with.