Employee123
asked on
Use SED to make the xml file a single string with no white spaces.
I have a xml file (sitemap.xml) file and I want to remove all the white spaces from the xml file using SED in bash script so that the new file is one continuous string with no white spaces.
I tried using this
sed -e 's/[\t ]//g;/^$/d' $inputFile>$outputFile
The problem is that it just removes the trailing and leading white spaces and still what carriage returns present. I want something which is a single continuous line as the end result.
Your help is greatly appreciated.
I tried using this
sed -e 's/[\t ]//g;/^$/d' $inputFile>$outputFile
The problem is that it just removes the trailing and leading white spaces and still what carriage returns present. I want something which is a single continuous line as the end result.
Your help is greatly appreciated.
sed -e 's/[\t ]//g;/^$/d' $inputFile | tr -d '\n' >$outputFile
ASKER
Hi Amit,
Thanks for the quick reply but the sed command u posted works but is creating an invalid xml file because its removing all the white spaces from inside the tags as well which I don't want. Only around the tags is what I want. So its causing an invalid xml error right now with your script.
Example:
Expert from Original xml:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
After applying the remove white space command:
<?xmlversion="1.0"encoding ="UTF-8"?> <urlsetxml ns:xsi="http://www.w3.org/2001/XMLSchema-instance"x si:schemaL ocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
U see how the words joined for xmlversion where they shud be separate.
Thanks for the quick reply but the sed command u posted works but is creating an invalid xml file because its removing all the white spaces from inside the tags as well which I don't want. Only around the tags is what I want. So its causing an invalid xml error right now with your script.
Example:
Expert from Original xml:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
After applying the remove white space command:
<?xmlversion="1.0"encoding
U see how the words joined for xmlversion where they shud be separate.
I was wondering about that too when I saw the sed (I did not update the sed, I just added the tr command). Try this
tr -d '\n\t' < $inputFile >$outputFile
tr -d '\n\t' < $inputFile >$outputFile
ASKER
Thanks dude! It took care of the spacing in the tag. That almost worked. The only problem now is its leaving some spaces intermittently like this example below all over the new xml.
</lastmod> </url> <url> <loc>
Do u know why?
</lastmod> </url> <url> <loc>
Do u know why?
Try
tr -d '\n\t' | sed -e 's/ //g' < $inputFile >$outputFile
or
tr -d '\n\t' | sed -e 's/> *</></g' < $inputFile >$outputFile
tr -d '\n\t' | sed -e 's/ //g' < $inputFile >$outputFile
or
tr -d '\n\t' | sed -e 's/> *</></g' < $inputFile >$outputFile
ASKER
Sorry to say but these two are not working!
The script runs and stops executing at this command. I tried both.
The output file is the same as the input file.
The script runs and stops executing at this command. I tried both.
The output file is the same as the input file.
Ah, sorry, added the command in wrong place...
tr -d '\n\t' < $inputFile | sed -e 's/ //g' >$outputFile
or
tr -d '\n\t' < $inputFile | sed -e 's/> *</></g' >$outputFile
tr -d '\n\t' < $inputFile | sed -e 's/ //g' >$outputFile
or
tr -d '\n\t' < $inputFile | sed -e 's/> *</></g' >$outputFile
ASKER
hey Amit, doesnt look like its working either. Please try to run in on my test xml and see what I mean? It just takes out the leading and trailing spaces but not the carriage returns.
abx.xml
abx.xml
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
You nailed it. You are truly a genius!! :) thanks!!
ASKER
The problem is solved with Amit's help.