Use SED to make the xml file a single string with no white spaces.

Employee123
Employee123 used Ask the Experts™
on
I have a xml file (sitemap.xml) file and I want to remove all the white spaces from the xml file using SED in bash script so that the new file is one continuous string with no white spaces.

I tried using this
sed -e 's/[\t ]//g;/^$/d' $inputFile>$outputFile

The problem is that it just removes the trailing and leading white spaces and still what carriage returns present. I want something which is a single continuous line as the end result.

Your help is greatly appreciated.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2006

Commented:
sed -e 's/[\t ]//g;/^$/d' $inputFile | tr -d '\n' >$outputFile

Author

Commented:
Hi Amit,

Thanks for the quick reply but the sed command u posted works but is creating an invalid xml file because its removing all the white spaces from inside the tags as well which I don't want. Only around the tags is what I want. So its causing an invalid xml error right now with your script.

Example:

Expert from Original xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9   http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

After applying the remove white space command:

<?xmlversion="1.0"encoding="UTF-8"?><urlsetxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

U see how the words joined for xmlversion where they shud be separate.

Top Expert 2006

Commented:
I was wondering about that too when I saw the sed (I did not update the sed, I just added the tr command). Try this

tr -d '\n\t' < $inputFile >$outputFile
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Author

Commented:
Thanks dude! It took care of the spacing in the tag. That almost worked. The only problem now is its leaving some spaces intermittently like this example below all over the new xml.
</lastmod> </url>     <url> <loc>

Do u know why?
Top Expert 2006

Commented:
Try


tr -d '\n\t' | sed -e 's/  //g' < $inputFile >$outputFile

or

tr -d '\n\t' | sed -e 's/> *</></g' < $inputFile >$outputFile

Author

Commented:
Sorry to say but these two are not working!
The script runs and stops executing at this command. I tried both.
The output file is the same as the input file.
Top Expert 2006

Commented:
Ah, sorry, added the command in wrong place...

tr -d '\n\t' < $inputFile | sed -e 's/  //g' >$outputFile

or

tr -d '\n\t' < $inputFile | sed -e 's/> *</></g' >$outputFile

Author

Commented:
hey Amit, doesnt look like its working either. Please try to run in on my test xml and see what I mean? It just takes out the leading and trailing spaces but not the carriage returns.
abx.xml
Top Expert 2006
Commented:
The file seems to be a dos file. Update the command to

tr -d '\n\r\f' < abx.xml | sed -e 's/[\t ][\t ]*/ /g' -e 's/> </></g' -e 's/^ //' -e 's/ $//' > abx_out.xml

Author

Commented:
You nailed it. You are truly a genius!! :) thanks!!

Author

Commented:
The problem is solved with Amit's help.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial