[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 342
  • Last Modified:

remove all cr and lf from xml file in bsh/ksh

Greetings, I am in need of removing all CR's and LF's from an incoming text file and save it back in place.
Is this possible? in Shell?
Thank you.
0
Evan Cutler
Asked:
Evan Cutler
  • 5
  • 2
  • 2
  • +4
2 Solutions
 
nemws1Commented:
I'm sure there are many ways.  Here's mine:

perl -pe 's/[\n\r]//g' < input.txt > output.txt

Open in new window

0
 
Jan SpringerCommented:
It is.  I usually grab a sample file and open it with vi to see what kind of characters are present (sometimes it's a <ctrl><m>).   You can feed the file through sed in a single command to remove those characters.
0
 
ozoCommented:
to save it back in place, I'd prefer
perl -i -pe 's/[\n\r]//g'  input.txt
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
tel2Commented:
Hi _jesper_,
> You can feed the file through sed in a single command to remove those characters.
Care to share your sed solution with the class?

arcee123, here's a "tr" solution,
    tr -d '\n\r' <input.txt >input.tmp
    mv input.tmp input.txt
Not exactly in-place, eh.  Too bad "tr" takes input from STDIN only, and can't do in-place edits.
I prefer Perl for this, and ozo's solution is what I would usually use, but I'd guess that the above "tr" solution might run slightly faster, if speed is important.
0
 
Jan SpringerCommented:
This should work cross-platform (and does on linux):

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile > outputfile
0
 
tel2Commented:
Hi _jesper_,

That's not working for me on Linux:

# cat -vet inputfile
line1^M$
line2^M$
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile
 line2

Have I done something wrong?
0
 
simon3270Commented:
Most "in-place" edits ("sed -i" and "perl -i", for example) actually write to a new file, then rename that new file to the same name as the old one when the program has done its work.  If the process fails for some reason (out of disk space, for example), the program will usually leave the original file as it was.

You can get almost the same effect with "tr" using:
    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file

I fyou really do just want to remove certain characters from a file, tr is your best bet - other solutions (perl and sed, for example) may well try to read line by line - if the line endings aren't what the program expects, it may not produce the expected result.  That, i think, is what's happened in @jesper's solution - there is a carriage erturn (^M, or \r) left in the oputput, so the output is line1\rline2, and when displayed, the \r really does move the cursor back to the left edge after printing "line1", so the "line2" text overwrites "line1".
0
 
tel2Commented:
Hi Simon.

Point taken about in-place edits.

> You can get almost the same effect with "tr" using:
>    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file
Any reason why you need to 'rm' the 'input_file'?  mv replaces it anyway, so this should do the same thing, right?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file

Good point about _jesper_'s solution.  I should have spotted that.  Apart from the \r chars, it also seems to be inserting a space and not removing the last \n.  Check this out:
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile | od -bc
0000000 154 151 156 145 061 015 040 154 151 156 145 062 015 012
          l   i   n   e   1  \r       l   i   n   e   2  \r  \n

Open in new window

0
 
simon3270Commented:
I suppose the "rm" is necessary in case the "input_file" is read-only, but in a user-writable directory - in that case it should really be "rm -f input_file".  An alternative would be to use "mv -f", but I think that the "rm" makes it more obvious that we're tidying up.
0
 
skullnobrainsCommented:
a regular sed would work doing basically s/[\n\r]//
or rather using \012 and \015 : s/[\012\015]//

if you are using gnused, you're hitting a bug : gnu sed inserts line endings after each printed line even when you removed them. the gnu sed developpers refuse to acknowlege this as a bug. the consequence is that you have to hack it so that all the lines get regrouped together in sed's memory and there is no way you can remove the last one. basically either use a proper sed or use another tool. i'd probably go the tr way for portability and simplicity.
0
 
tel2Commented:
> i'd probably go the tr way for portability and simplicity.
Q1. Wouldn't Perl be more portable and simpler, skullnobrains?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file
vs.
    perl -i -pe 's/[\n\r]//g' input_fle
More portable because Perl runs on Windows, etc., too.

Q2. I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced.  Here's GNU sed (your "favourite" type of sed):
# od -bc input_file
0000000 154 151 156 145 061 015 012 154 151 156 145 062 015 012
          l   i   n   e   1  \r  \n   l   i   n   e   2  \r  \n
0000016
# sed 's/[\012\015]//g' input_file | od -bc
0000000 154 151 156 145 015 012 154 151 156 145 015 012
          l   i   n   e  \r  \n   l   i   n   e  \r  \n

Open in new window

(I've added a '/g' switch to your code.)
Note how it removed the '1' and '2'.  That's because the '\012' matched '0', '1' & '2'.

Or is this another GNU sed issue of not recognising things like '\012' as a single character?
Have you tried it in a non-GNU sed, skullnobrains?  I'm not sure I have access to one.
0
 
skullnobrainsCommented:
@tel2
More portable because Perl runs on Windows, etc., too.

arguably yes, but installing tr or bash for windows is not much more difficult than installing perl. (http://sourceforge.net/projects/win-bash/files/shell-complete/latest/)

perl is also just not installed on many systems by default and tr is part of the base system of roughly everything else.

the question is classified as "shell-scripting" so i tried to privilege something that could be used in as may shells as possible.

I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced

my bad, gnu sed only understand hex codes and not octal. see below

$ echo -e "x0x1\tx2x"
x0x1	x2x
$ echo -e "x0x1\tx2x" | sed 's/\x09//'
x0x1x2x
$ echo -e "x0x1\tx2x" | sed 's/[\x09]//'
x0x1x2x
$ sed --version | head -n 1
GNU sed version 4.2.1

Open in new window


the octal version would work on BSDs and AIX for example, but i do not have time to reboot into freebsd or launch a vm to demonstrate that right now.

anyway, i'm not sure we are being helpful to anyone there
0
 
tel2Commented:
OK, thanks for that, snb.
0
 
Evan CutlerAuthor Commented:
Guys, thank you so much for this.
This was one heck of a lesson.

I was able to do several things with the solutions you provided.
THank you very much, again...
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 5
  • 2
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now