How do i handle &nsbp; in stylesheet?

I have a stylesheet which accepts xml (which happens to be html) from some process. The problem is that the process is generating &nsbp; which causes the xsl parser to crash.

From looking on the web i see that &nsbp is not allowed, a suggestion was to put the following ENTITY declaration in the header:

<!ENTITY nbsp CDATA "&#160;" >

<?xml version="1.0" encoding="Windows-1252" ?>
xsl code...

.. but i now get an xslt compile error. As i understand it the above declaration should replace all occurrances of &nbsp; to &#160; which xsl DOES handle. I suspect that the post i found on the web missed something from the above declaration. Can anyone correct the above or suggest another way for my xslt to handle &nbsp;

Thanks in advance :)
LVL 1
paddycobbettAsked:
Who is Participating?
 
R7AFCommented:
&nbsp; is not valid XML by default, unless defined inside the XML. This seems not to be the case, so the XML is not valid. You could read the XML in a string and replace the nbsp with &#160;

See http://www.experts-exchange.com/Q_22526834.html
0
 
Geert BormansInformation ArchitectCommented:
yes, you are not doing it 100% right
it should be like this

<?xml version="1.0" encoding="Windows-1252" ?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nbsp CDATA "&#160;" >
]>
<xsl:stylesheet ...

0
 
Geert BormansInformation ArchitectCommented:
> As i understand it the above declaration should replace all occurrances of &nbsp; to &#160; which xsl DOES handle

also make sure thatthe original XML has this declaration
because the XSLT only needs this doctype declaration if you use the &nbsp; in the stylesheet

the parser internally transforms the &nbsp; to a legal entity, before the xslt processor gets it
0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

 
paddycobbettAuthor Commented:
The suggestion by R7AF was a last resort, i had considered filtering that value from the process. If i can handle it in the stylesheet then that would be ideal. Gertone, you gave me a corrected version, but still results in the same error :S
0
 
paddycobbettAuthor Commented:
So the xml coming in should also have:

<?xml version="1.0" encoding="Windows-1252" ?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nbsp CDATA "&#160;" >
]>

?
0
 
Geert BormansInformation ArchitectCommented:
yes, correct, be it that the doctype should be different (it should be equal to the root element)
<?xml version="1.0" encoding="Windows-1252" ?>
<!DOCTYPE root [
<!ENTITY nbsp CDATA "&#160;" >
]>

if it does not have that, it is illegal XML,
that would mean that it is likely still HTML

The best approach to get HTML into wellformed XML, so you can handle it with XSLT is by preprocessing it using TagSoup
google for "download tagsoup" to get a copy
(it also handles encodings etc right)
after tagsoup you have wellformed XML and you can get it done with XSLT

I hesitate to recommend R7AFs approach, since that would mean processing the full XML
I would recommend stripping the declaration off, finding the root element and adding the doctype declaration
at that point you don't have to process the full file, but only the first line (or two)

You don't necesarily have to automate that
you can also change the process that generates your pseudoXML

cheers

Geert
0
 
paddycobbettAuthor Commented:
Thanks, having investigated the code base i'm working on it turned out to be more straight forward then i anticipated to insert code to strip off the &nsbp;

Thanks for both suggestions which i'm sure are valid. I've allocated more points to R7AF since it is the suggestion which suited me best in this case.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.