Solved

remove all cr and lf from xml file in bsh/ksh

Posted on 2013-06-20
14
332 Views
Last Modified: 2013-07-07
Greetings, I am in need of removing all CR's and LF's from an incoming text file and save it back in place.
Is this possible? in Shell?
Thank you.
0
Comment
Question by:Evan Cutler
  • 5
  • 2
  • 2
  • +4
14 Comments
 
LVL 23

Expert Comment

by:nemws1
ID: 39264332
I'm sure there are many ways.  Here's mine:

perl -pe 's/[\n\r]//g' < input.txt > output.txt

Open in new window

0
 
LVL 28

Expert Comment

by:Jan Springer
ID: 39264333
It is.  I usually grab a sample file and open it with vi to see what kind of characters are present (sometimes it's a <ctrl><m>).   You can feed the file through sed in a single command to remove those characters.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39264944
to save it back in place, I'd prefer
perl -i -pe 's/[\n\r]//g'  input.txt
0
Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 12

Expert Comment

by:tel2
ID: 39265542
Hi _jesper_,
> You can feed the file through sed in a single command to remove those characters.
Care to share your sed solution with the class?

arcee123, here's a "tr" solution,
    tr -d '\n\r' <input.txt >input.tmp
    mv input.tmp input.txt
Not exactly in-place, eh.  Too bad "tr" takes input from STDIN only, and can't do in-place edits.
I prefer Perl for this, and ozo's solution is what I would usually use, but I'd guess that the above "tr" solution might run slightly faster, if speed is important.
0
 
LVL 28

Expert Comment

by:Jan Springer
ID: 39265799
This should work cross-platform (and does on linux):

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile > outputfile
0
 
LVL 12

Expert Comment

by:tel2
ID: 39267383
Hi _jesper_,

That's not working for me on Linux:

# cat -vet inputfile
line1^M$
line2^M$
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile
 line2

Have I done something wrong?
0
 
LVL 19

Expert Comment

by:simon3270
ID: 39269875
Most "in-place" edits ("sed -i" and "perl -i", for example) actually write to a new file, then rename that new file to the same name as the old one when the program has done its work.  If the process fails for some reason (out of disk space, for example), the program will usually leave the original file as it was.

You can get almost the same effect with "tr" using:
    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file

I fyou really do just want to remove certain characters from a file, tr is your best bet - other solutions (perl and sed, for example) may well try to read line by line - if the line endings aren't what the program expects, it may not produce the expected result.  That, i think, is what's happened in @jesper's solution - there is a carriage erturn (^M, or \r) left in the oputput, so the output is line1\rline2, and when displayed, the \r really does move the cursor back to the left edge after printing "line1", so the "line2" text overwrites "line1".
0
 
LVL 12

Expert Comment

by:tel2
ID: 39269969
Hi Simon.

Point taken about in-place edits.

> You can get almost the same effect with "tr" using:
>    tr -d '\r\n' < input_file > output_file && rm input_file && mv output_file input_file
Any reason why you need to 'rm' the 'input_file'?  mv replaces it anyway, so this should do the same thing, right?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file

Good point about _jesper_'s solution.  I should have spotted that.  Apart from the \r chars, it also seems to be inserting a space and not removing the last \n.  Check this out:
# sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' inputfile | od -bc
0000000 154 151 156 145 061 015 040 154 151 156 145 062 015 012
          l   i   n   e   1  \r       l   i   n   e   2  \r  \n

Open in new window

0
 
LVL 19

Expert Comment

by:simon3270
ID: 39271915
I suppose the "rm" is necessary in case the "input_file" is read-only, but in a user-writable directory - in that case it should really be "rm -f input_file".  An alternative would be to use "mv -f", but I think that the "rm" makes it more obvious that we're tidying up.
0
 
LVL 27

Expert Comment

by:skullnobrains
ID: 39289592
a regular sed would work doing basically s/[\n\r]//
or rather using \012 and \015 : s/[\012\015]//

if you are using gnused, you're hitting a bug : gnu sed inserts line endings after each printed line even when you removed them. the gnu sed developpers refuse to acknowlege this as a bug. the consequence is that you have to hack it so that all the lines get regrouped together in sed's memory and there is no way you can remove the last one. basically either use a proper sed or use another tool. i'd probably go the tr way for portability and simplicity.
0
 
LVL 12

Accepted Solution

by:
tel2 earned 250 total points
ID: 39291819
> i'd probably go the tr way for portability and simplicity.
Q1. Wouldn't Perl be more portable and simpler, skullnobrains?:
    tr -d '\r\n' <input_file >output_file && mv output_file input_file
vs.
    perl -i -pe 's/[\n\r]//g' input_fle
More portable because Perl runs on Windows, etc., too.

Q2. I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced.  Here's GNU sed (your "favourite" type of sed):
# od -bc input_file
0000000 154 151 156 145 061 015 012 154 151 156 145 062 015 012
          l   i   n   e   1  \r  \n   l   i   n   e   2  \r  \n
0000016
# sed 's/[\012\015]//g' input_file | od -bc
0000000 154 151 156 145 015 012 154 151 156 145 015 012
          l   i   n   e  \r  \n   l   i   n   e  \r  \n

Open in new window

(I've added a '/g' switch to your code.)
Note how it removed the '1' and '2'.  That's because the '\012' matched '0', '1' & '2'.

Or is this another GNU sed issue of not recognising things like '\012' as a single character?
Have you tried it in a non-GNU sed, skullnobrains?  I'm not sure I have access to one.
0
 
LVL 27

Assisted Solution

by:skullnobrains
skullnobrains earned 250 total points
ID: 39292154
@tel2
More portable because Perl runs on Windows, etc., too.

arguably yes, but installing tr or bash for windows is not much more difficult than installing perl. (http://sourceforge.net/projects/win-bash/files/shell-complete/latest/)

perl is also just not installed on many systems by default and tr is part of the base system of roughly everything else.

the question is classified as "shell-scripting" so i tried to privilege something that could be used in as may shells as possible.

I'm no sed expert but I didn't know that sed recognised things like '\012'.  I'm still not convinced

my bad, gnu sed only understand hex codes and not octal. see below

$ echo -e "x0x1\tx2x"
x0x1	x2x
$ echo -e "x0x1\tx2x" | sed 's/\x09//'
x0x1x2x
$ echo -e "x0x1\tx2x" | sed 's/[\x09]//'
x0x1x2x
$ sed --version | head -n 1
GNU sed version 4.2.1

Open in new window


the octal version would work on BSDs and AIX for example, but i do not have time to reboot into freebsd or launch a vm to demonstrate that right now.

anyway, i'm not sure we are being helpful to anyone there
0
 
LVL 12

Expert Comment

by:tel2
ID: 39292192
OK, thanks for that, snb.
0
 
LVL 9

Author Closing Comment

by:Evan Cutler
ID: 39305235
Guys, thank you so much for this.
This was one heck of a lesson.

I was able to do several things with the solutions you provided.
THank you very much, again...
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tech tip describes how to install the Solaris Operating System from a tape backup that was created using the Solaris flash archive utility. I have used this procedure on the Solaris 8 and 9 OS, and it shoudl also work well on the Solaris 10 rel…
Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question