Solved

remove all cr and lf from xml file in bsh/ksh

Posted on 2013-06-20
14
328 Views
Last Modified: 2013-07-07
Greetings, I am in need of removing all CR's and LF's from an incoming text file and save it back in place.
Is this possible? in Shell?
Thank you.
0
Comment
Question by:Evan Cutler
  • 5
  • 2
  • 2
  • +4
14 Comments
 
LVL 23

Expert Comment

by:nemws1
ID: 39264332
I'm sure there are many ways.  Here's mine:

perl -pe 's/[\n\r]//g' < input.txt > output.txt

Open in new window

0
 
LVL 28

Expert Comment

by:Jan Springer
ID: 39264333
It is.  I usually grab a sample file and open it with vi to see what kind of characters are present (sometimes it's a <ctrl><m>).   You can feed the file through sed in a single command to remove those characters.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39264944
to save it back in place, I'd prefer
perl -i -pe 's/[\n\r]//g'  input.txt
0
 
LVL 11

Expert Comment

by:tel2
ID: 39265542
Hi _jesper_,
> You can feed the file through sed in a single command to remove those characters.
Care to share your sed solution with the class?

arcee123, here's a "tr" solution,
    tr -d '\n\r' <input.txt >input.tmp
    mv input.tmp input.txt
Not exactly in-place, eh.  Too bad "tr" takes input from STDIN only, and can't do in-place edits.
I prefer Perl for this, and ozo's solution is what I would usually use, but I'd guess that the above "tr" solution might run slightly faster, if speed is important.
0
 
LVL 28

Expert Comment

by:Jan Springer
ID: 39265799
This should work cross-platform (and does on linux):

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile > outputfile
0
 
LVL 11

Expert Comment

by:tel2
ID: 39267383
Hi _jesper_,

That's not working for me on Linux:

# cat -vet inputfile
line1^M$
line2^M$
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile
 line2

Have I done something wrong?
0
 
LVL 19

Expert Comment

by:simon3270
ID: 39269875
Most "in-place" edits ("sed -i" and "perl -i", for example) actually write to a new file, then rename that new file to the same name as the old one when the program has done its work.  If the process fails for some reason (out of disk space, for example), the program will usually leave the original file as it was.

You can get almost the same effect with "tr" using:
    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file

I fyou really do just want to remove certain characters from a file, tr is your best bet - other solutions (perl and sed, for example) may well try to read line by line - if the line endings aren't what the program expects, it may not produce the expected result.  That, i think, is what's happened in @jesper's solution - there is a carriage erturn (^M, or \r) left in the oputput, so the output is line1\rline2, and when displayed, the \r really does move the cursor back to the left edge after printing "line1", so the "line2" text overwrites "line1".
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 11

Expert Comment

by:tel2
ID: 39269969
Hi Simon.

Point taken about in-place edits.

> You can get almost the same effect with "tr" using:
>    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file
Any reason why you need to 'rm' the 'input_file'?  mv replaces it anyway, so this should do the same thing, right?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file

Good point about _jesper_'s solution.  I should have spotted that.  Apart from the \r chars, it also seems to be inserting a space and not removing the last \n.  Check this out:
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile | od -bc
0000000 154 151 156 145 061 015 040 154 151 156 145 062 015 012
          l   i   n   e   1  \r       l   i   n   e   2  \r  \n

Open in new window

0
 
LVL 19

Expert Comment

by:simon3270
ID: 39271915
I suppose the "rm" is necessary in case the "input_file" is read-only, but in a user-writable directory - in that case it should really be "rm -f input_file".  An alternative would be to use "mv -f", but I think that the "rm" makes it more obvious that we're tidying up.
0
 
LVL 26

Expert Comment

by:skullnobrains
ID: 39289592
a regular sed would work doing basically s/[\n\r]//
or rather using \012 and \015 : s/[\012\015]//

if you are using gnused, you're hitting a bug : gnu sed inserts line endings after each printed line even when you removed them. the gnu sed developpers refuse to acknowlege this as a bug. the consequence is that you have to hack it so that all the lines get regrouped together in sed's memory and there is no way you can remove the last one. basically either use a proper sed or use another tool. i'd probably go the tr way for portability and simplicity.
0
 
LVL 11

Accepted Solution

by:
tel2 earned 250 total points
ID: 39291819
> i'd probably go the tr way for portability and simplicity.
Q1. Wouldn't Perl be more portable and simpler, skullnobrains?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file
vs.
    perl -i -pe 's/[\n\r]//g' input_fle
More portable because Perl runs on Windows, etc., too.

Q2. I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced.  Here's GNU sed (your "favourite" type of sed):
# od -bc input_file
0000000 154 151 156 145 061 015 012 154 151 156 145 062 015 012
          l   i   n   e   1  \r  \n   l   i   n   e   2  \r  \n
0000016
# sed 's/[\012\015]//g' input_file | od -bc
0000000 154 151 156 145 015 012 154 151 156 145 015 012
          l   i   n   e  \r  \n   l   i   n   e  \r  \n

Open in new window

(I've added a '/g' switch to your code.)
Note how it removed the '1' and '2'.  That's because the '\012' matched '0', '1' & '2'.

Or is this another GNU sed issue of not recognising things like '\012' as a single character?
Have you tried it in a non-GNU sed, skullnobrains?  I'm not sure I have access to one.
0
 
LVL 26

Assisted Solution

by:skullnobrains
skullnobrains earned 250 total points
ID: 39292154
@tel2
More portable because Perl runs on Windows, etc., too.

arguably yes, but installing tr or bash for windows is not much more difficult than installing perl. (http://sourceforge.net/projects/win-bash/files/shell-complete/latest/)

perl is also just not installed on many systems by default and tr is part of the base system of roughly everything else.

the question is classified as "shell-scripting" so i tried to privilege something that could be used in as may shells as possible.

I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced

my bad, gnu sed only understand hex codes and not octal. see below

$ echo -e "x0x1\tx2x"
x0x1	x2x
$ echo -e "x0x1\tx2x" | sed 's/\x09//'
x0x1x2x
$ echo -e "x0x1\tx2x" | sed 's/[\x09]//'
x0x1x2x
$ sed --version | head -n 1
GNU sed version 4.2.1

Open in new window


the octal version would work on BSDs and AIX for example, but i do not have time to reboot into freebsd or launch a vm to demonstrate that right now.

anyway, i'm not sure we are being helpful to anyone there
0
 
LVL 11

Expert Comment

by:tel2
ID: 39292192
OK, thanks for that, snb.
0
 
LVL 9

Author Closing Comment

by:Evan Cutler
ID: 39305235
Guys, thank you so much for this.
This was one heck of a lesson.

I was able to do several things with the solutions you provided.
THank you very much, again...
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

My previous tech tip, Installing the Solaris OS From the Flash Archive On a Tape (http://www.experts-exchange.com/articles/OS/Unix/Solaris/Installing-the-Solaris-OS-From-the-Flash-Archive-on-a-Tape.html), discussed installing the Solaris Operating S…
Java performance on Solaris - Managing CPUs There are various resource controls in operating system which directly/indirectly influence the performance of application. one of the most important resource controls is "CPU".   In a multithreaded…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now