Need to remove white space from text contained in values of XML document
Posted on 2009-04-24
Dear fellow XML/XSLT developers:
I have an XML document that contains elements which hold in some case, quite a bit of text. Unfortunately, a lot of this text contains quite a bit of extraneous white space and blank lines throughout the entire document. I would like a small program in Java or XSLT, that can go through the entire XML document and remove the white space, such that there exists only 2 white spaces after each period, double quotes, question/exclamation mark and colons, and a single space after commas, semi-colon's, and brackets (round, curly or square); i.e. standard spacing for punctuation in english. My XML document looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<collection name="Collection Title">
<book number="1" title="Book Title">
<quote number="1" reference="Book 1, Number 1">
<quotation>blah blah blah </quotation>
<quote number="2" reference="Book 1, Number 2">
<quotation>blah blah blah</quotation>
The XML document is structured, such that <collection> is the parent tags, which contain several <book> elements. Each <book> element contains several <quote> elements. The problem with the spacing exists within the <quotation> element ONLY (which exists within the <quote> element).
I hope this is clear. Please let me know if anything is confusing.
Thanks in advance to all who reply.