zc2
asked on
transform to document, single byte
I use MSXML from a C++ code. It transforms some XML and XSLT to XHTML. To do that I create an IXSLProcessor out of IXSLTemplate and then call its transform() method passing an empty IXMLDOMDocument object (I want an IXMLDOMDocument be populated with the output because I want to do some additional manipulation with its nodes).
That work fine if the XSLT has an <xsl:output method="xml" encoding="UTF-8"/> instruction.
The C++ code can be compiled in two versions - single byte encoding (windows-1252) and Unicode. For the single byte case the XSLT has <xsl:output method="xml" encoding="windows-1252"/>
If the XML or XSLT has a not ASCII (greater than 127) character, the transform() method fails. It either returns E_FAIL or the output document remains empty.
Is it possible to setup the output document to accept a specific single-byte encoding (windows-1252)?
That work fine if the XSLT has an <xsl:output method="xml" encoding="UTF-8"/> instruction.
The C++ code can be compiled in two versions - single byte encoding (windows-1252) and Unicode. For the single byte case the XSLT has <xsl:output method="xml" encoding="windows-1252"/>
If the XML or XSLT has a not ASCII (greater than 127) character, the transform() method fails. It either returns E_FAIL or the output document remains empty.
Is it possible to setup the output document to accept a specific single-byte encoding (windows-1252)?
ASKER
all input files are single byte, no UTF-8, BOMs, etc.
XML:
XML:
<inset include="8y46bc"/>
XSL:
<?xml version="1.0" encoding="windows-1252"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" encoding="windows-1252" omit-xml-declaration="yes"/>
<xsl:template match="/inset">
<!-- template body -->
</xsl:template>
</xsl:stylesheet>
Hi,
<inset include="8y46bc"/>This is all the content of the xml file?
When I need to guess, it has do with the encoding detection from the input
<inset include="8y46bc"/>
When you don't specify an encoding in the file as encoding attribute, then a single byte file is undistinguishable from a UTF-8. You need to test this, as I already wrote:Change your input file to:
<?xml version="1.0" encoding="windows-1252"?>
<inset include="8y46bc"/>
ASKER
This is all the content of the xml file?In the test sample I am working with - yes. In production there would be data.
You need to test this, as I already wrote:I will try that.
I think, that the problem is not with the input but with the output. Since the XSL has the omit-xml-declaration="yes"
Well, what about posing a concise and complete example? When it's VS, then attaching such a project would help. Include your test harness.
ASKER
Please, check thisI don't see how is that relevant.
My question is simple - is it possible to tell MSXML Document object to expect a single byte input to load, not UTF8 (except putting the XML declaration)?
ASKER
ste5an,
As I expected, adding the XML declaration to the input XML file does not change a thing.
Only removing the omit-xml-declaration="yes" attribute from XSLT makes the output DOMdocument load it correctly.
But the problem is that I don't want the XML declaration in the output.
As I expected, adding the XML declaration to the input XML file does not change a thing.
Only removing the omit-xml-declaration="yes"
But the problem is that I don't want the XML declaration in the output.
Hi,
The load method will do just what its name says. What you can do is to modify the XML document properties to fit to your scenario.
The load method will do just what its name says. What you can do is to modify the XML document properties to fit to your scenario.
ASKER
modify the XML document propertiesThat's exactly my question. How do I do that?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
When I need to guess, it has do with the encoding detection from the input. So check the input file for its encoding, whether it has a BOM or not and whether the XML itself declares an encoding and whether it is the correct one.
Thus take your existing input file, convert it to UTF-8, add BOM and the encoding attribute. Test it.
Do the same for your code page. Convert the file, add the appropriate encoding attribute and save it using that encoding. Test it.
E.g. use Notepad++ for manipulating the encoding of the file and storing it accordingly.