Link to home
Start Free TrialLog in
Avatar of PMGreensted
PMGreenstedFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Split a file in UNIX into seperate files based on a delimiter character.

Hi Experts,
How would you split a file (input.txt)  into seperate files (output.txtn) based on the delimiter $ only.

input.txt -
{headertext}{moreheadertext}{id:somecontent;}{11{22}{33}}${headertext2}{moreheadertext}{id:somecontent2;}{11{22}{33}}${headertext3}{moreheadertext}{id:somecontent3;}{11{22}{33}}

output.txt1 -
{headertext}{moreheadertext}{id:somecontent;}{11{22}{33}}

output.txt2 -
{headertext2}{moreheadertext}{id:somecontent2;}{11{22}{33}}

output.txt3 -
{headertext3}{moreheadertext}{id:somecontent3;}{11{22}{33}}
Avatar of ghostdog74
ghostdog74

one way

awk 'BEGIN{FS="$"}
 {
   for(i=1;i<=NF;i++){
        print $i > "output-"i"-"NR".txt"
   }
 }

' "file"
Avatar of PMGreensted

ASKER

Thanks, that works.

Just one variation - If the source file has line breaks there is an output file created for each line as well as each side of the $ delimiter.

To get round this I could remove all line breaks first like this:

tr -d \\n <input.txt> input2.txt

awk 'BEGIN{FS="$"}
 {
   for(i=1;i<=NF;i++){
        print $i > "output-"i"-"NR".txt"
   }
 }

' "input2.txt"

But I would rather have the line breaks if possible. I could replace the line breaks in the first place with:
   tr \\n '~~~~~'  <input.txt> input2.txt
then put them all back again for each output file:
   tr '~~~~~' \\n <output-1-1.txt> output-1-1.txt2
but this seems a bit long-winded. Is there an easier way?
Avatar of ozo
perl -044pe 'chomp;open STDOUT,">output.txt$."'
perl -044l12pe 'open STDOUT,">output.txt$."'
Hi ozo,
Sorry I'm not too familiar with perl. What does perl -044l12pe 'open STDOUT,">output.txt$."' do?
It splits files into "lines" ending in '$' = '\044' and writes each line to separate output.txtn after removing the trailing $ and replacing it with '\n' = '\012'
actually i am not sure what do you mean.. please post a sample of that input.txt again, including the line breaks , so i could see what's going on. Then also state what is expected to see.as output.
OK, well let's say:

input.txt -
{headertext}{moreheadertext}{id:
somecontent
;}{11{22}{33}}${headertext2}{moreheadertext}{id:
somecontent2
;}{11{22}{33}}${headertext3}{moreheadertext}{id:
somecontent3
;}{11{22}{33}}

I'm trying to get the output like:

output.txt1 -
{headertext}{moreheadertext}{id:somecontent;}{11{22}{33}}

output.txt2 -
{headertext2}{moreheadertext}{id:somecontent2;}{11{22}{33}}

output.txt3 -
{headertext3}{moreheadertext}{id:somecontent3;}{11{22}{33}}

Your awk example worked for my single line input.txt but breaks up the input.txt into seperate output files for each line if input.txt is a multiline file.

Sorry, I know you got it right in the first place, but then realised my input.txt may sometimes have multiple lines.
So you want to strip newlines?

perl -044pe 's/[$\n]//g;open STDOUT,">output.txt$."'  input.txt

perl -044pe 's/[\$\n]//g;open STDOUT,">output.txt$."'  input.txt
ASKER CERTIFIED SOLUTION
Avatar of ghostdog74
ghostdog74

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks ghostdog74 and ozo,

I tested the following and it works fine:

for filename in *.txt
 do
  awk 'BEGIN{RS="$"}{ gsub(/[$]/,""); print > ('$filename'"-"NR)}' $filename
  rm $filename
 done
Are you saying the earlier suggestions did not work?
No I'm not. All the solutions worked to certain degree. With a few changes here and there they can give the same result. I was just showing you guys what I ended up with.