Link to home
Start Free TrialLog in
Avatar of Employee123
Employee123

asked on

Use SED to make the xml file a single string with no white spaces.

I have a xml file (sitemap.xml) file and I want to remove all the white spaces from the xml file using SED in bash script so that the new file is one continuous string with no white spaces.

I tried using this
sed -e 's/[\t ]//g;/^$/d' $inputFile>$outputFile

The problem is that it just removes the trailing and leading white spaces and still what carriage returns present. I want something which is a single continuous line as the end result.

Your help is greatly appreciated.
Avatar of amit_g
amit_g
Flag of United States of America image

sed -e 's/[\t ]//g;/^$/d' $inputFile | tr -d '\n' >$outputFile
Avatar of Employee123
Employee123

ASKER

Hi Amit,

Thanks for the quick reply but the sed command u posted works but is creating an invalid xml file because its removing all the white spaces from inside the tags as well which I don't want. Only around the tags is what I want. So its causing an invalid xml error right now with your script.

Example:

Expert from Original xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9   http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

After applying the remove white space command:

<?xmlversion="1.0"encoding="UTF-8"?><urlsetxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

U see how the words joined for xmlversion where they shud be separate.

I was wondering about that too when I saw the sed (I did not update the sed, I just added the tr command). Try this

tr -d '\n\t' < $inputFile >$outputFile
Thanks dude! It took care of the spacing in the tag. That almost worked. The only problem now is its leaving some spaces intermittently like this example below all over the new xml.
</lastmod> </url>     <url> <loc>

Do u know why?
Try


tr -d '\n\t' | sed -e 's/  //g' < $inputFile >$outputFile

or

tr -d '\n\t' | sed -e 's/> *</></g' < $inputFile >$outputFile
Sorry to say but these two are not working!
The script runs and stops executing at this command. I tried both.
The output file is the same as the input file.
Ah, sorry, added the command in wrong place...

tr -d '\n\t' < $inputFile | sed -e 's/  //g' >$outputFile

or

tr -d '\n\t' < $inputFile | sed -e 's/> *</></g' >$outputFile
hey Amit, doesnt look like its working either. Please try to run in on my test xml and see what I mean? It just takes out the leading and trailing spaces but not the carriage returns.
abx.xml
ASKER CERTIFIED SOLUTION
Avatar of amit_g
amit_g
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You nailed it. You are truly a genius!! :) thanks!!
The problem is solved with Amit's help.