?
Solved

remove all cr and lf from xml file in bsh/ksh

Posted on 2013-06-20
14
Medium Priority
?
337 Views
Last Modified: 2013-07-07
Greetings, I am in need of removing all CR's and LF's from an incoming text file and save it back in place.
Is this possible? in Shell?
Thank you.
0
Comment
Question by:Evan Cutler
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
  • +4
14 Comments
 
LVL 23

Expert Comment

by:nemws1
ID: 39264332
I'm sure there are many ways.  Here's mine:

perl -pe 's/[\n\r]//g' < input.txt > output.txt

Open in new window

0
 
LVL 29

Expert Comment

by:Jan Springer
ID: 39264333
It is.  I usually grab a sample file and open it with vi to see what kind of characters are present (sometimes it's a <ctrl><m>).   You can feed the file through sed in a single command to remove those characters.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39264944
to save it back in place, I'd prefer
perl -i -pe 's/[\n\r]//g'  input.txt
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Expert Comment

by:tel2
ID: 39265542
Hi _jesper_,
> You can feed the file through sed in a single command to remove those characters.
Care to share your sed solution with the class?

arcee123, here's a "tr" solution,
    tr -d '\n\r' <input.txt >input.tmp
    mv input.tmp input.txt
Not exactly in-place, eh.  Too bad "tr" takes input from STDIN only, and can't do in-place edits.
I prefer Perl for this, and ozo's solution is what I would usually use, but I'd guess that the above "tr" solution might run slightly faster, if speed is important.
0
 
LVL 29

Expert Comment

by:Jan Springer
ID: 39265799
This should work cross-platform (and does on linux):

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile > outputfile
0
 
LVL 12

Expert Comment

by:tel2
ID: 39267383
Hi _jesper_,

That's not working for me on Linux:

# cat -vet inputfile
line1^M$
line2^M$
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile
 line2

Have I done something wrong?
0
 
LVL 19

Expert Comment

by:simon3270
ID: 39269875
Most "in-place" edits ("sed -i" and "perl -i", for example) actually write to a new file, then rename that new file to the same name as the old one when the program has done its work.  If the process fails for some reason (out of disk space, for example), the program will usually leave the original file as it was.

You can get almost the same effect with "tr" using:
    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file

I fyou really do just want to remove certain characters from a file, tr is your best bet - other solutions (perl and sed, for example) may well try to read line by line - if the line endings aren't what the program expects, it may not produce the expected result.  That, i think, is what's happened in @jesper's solution - there is a carriage erturn (^M, or \r) left in the oputput, so the output is line1\rline2, and when displayed, the \r really does move the cursor back to the left edge after printing "line1", so the "line2" text overwrites "line1".
0
 
LVL 12

Expert Comment

by:tel2
ID: 39269969
Hi Simon.

Point taken about in-place edits.

> You can get almost the same effect with "tr" using:
>    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file
Any reason why you need to 'rm' the 'input_file'?  mv replaces it anyway, so this should do the same thing, right?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file

Good point about _jesper_'s solution.  I should have spotted that.  Apart from the \r chars, it also seems to be inserting a space and not removing the last \n.  Check this out:
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile | od -bc
0000000 154 151 156 145 061 015 040 154 151 156 145 062 015 012
          l   i   n   e   1  \r       l   i   n   e   2  \r  \n

Open in new window

0
 
LVL 19

Expert Comment

by:simon3270
ID: 39271915
I suppose the "rm" is necessary in case the "input_file" is read-only, but in a user-writable directory - in that case it should really be "rm -f input_file".  An alternative would be to use "mv -f", but I think that the "rm" makes it more obvious that we're tidying up.
0
 
LVL 27

Expert Comment

by:skullnobrains
ID: 39289592
a regular sed would work doing basically s/[\n\r]//
or rather using \012 and \015 : s/[\012\015]//

if you are using gnused, you're hitting a bug : gnu sed inserts line endings after each printed line even when you removed them. the gnu sed developpers refuse to acknowlege this as a bug. the consequence is that you have to hack it so that all the lines get regrouped together in sed's memory and there is no way you can remove the last one. basically either use a proper sed or use another tool. i'd probably go the tr way for portability and simplicity.
0
 
LVL 12

Accepted Solution

by:
tel2 earned 1000 total points
ID: 39291819
> i'd probably go the tr way for portability and simplicity.
Q1. Wouldn't Perl be more portable and simpler, skullnobrains?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file
vs.
    perl -i -pe 's/[\n\r]//g' input_fle
More portable because Perl runs on Windows, etc., too.

Q2. I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced.  Here's GNU sed (your "favourite" type of sed):
# od -bc input_file
0000000 154 151 156 145 061 015 012 154 151 156 145 062 015 012
          l   i   n   e   1  \r  \n   l   i   n   e   2  \r  \n
0000016
# sed 's/[\012\015]//g' input_file | od -bc
0000000 154 151 156 145 015 012 154 151 156 145 015 012
          l   i   n   e  \r  \n   l   i   n   e  \r  \n

Open in new window

(I've added a '/g' switch to your code.)
Note how it removed the '1' and '2'.  That's because the '\012' matched '0', '1' & '2'.

Or is this another GNU sed issue of not recognising things like '\012' as a single character?
Have you tried it in a non-GNU sed, skullnobrains?  I'm not sure I have access to one.
0
 
LVL 27

Assisted Solution

by:skullnobrains
skullnobrains earned 1000 total points
ID: 39292154
@tel2
More portable because Perl runs on Windows, etc., too.

arguably yes, but installing tr or bash for windows is not much more difficult than installing perl. (http://sourceforge.net/projects/win-bash/files/shell-complete/latest/)

perl is also just not installed on many systems by default and tr is part of the base system of roughly everything else.

the question is classified as "shell-scripting" so i tried to privilege something that could be used in as may shells as possible.

I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced

my bad, gnu sed only understand hex codes and not octal. see below

$ echo -e "x0x1\tx2x"
x0x1	x2x
$ echo -e "x0x1\tx2x" | sed 's/\x09//'
x0x1x2x
$ echo -e "x0x1\tx2x" | sed 's/[\x09]//'
x0x1x2x
$ sed --version | head -n 1
GNU sed version 4.2.1

Open in new window


the octal version would work on BSDs and AIX for example, but i do not have time to reboot into freebsd or launch a vm to demonstrate that right now.

anyway, i'm not sure we are being helpful to anyone there
0
 
LVL 12

Expert Comment

by:tel2
ID: 39292192
OK, thanks for that, snb.
0
 
LVL 9

Author Closing Comment

by:Evan Cutler
ID: 39305235
Guys, thank you so much for this.
This was one heck of a lesson.

I was able to do several things with the solutions you provided.
THank you very much, again...
0

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses
Course of the Month15 days, 8 hours left to enroll

741 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question