asked on

transform to document, single byte

I use MSXML from a C++ code. It transforms some XML and XSLT to XHTML. To do that I create an IXSLProcessor out of IXSLTemplate and then call its transform() method passing an empty IXMLDOMDocument object (I want an IXMLDOMDocument be populated with the output because I want to do some additional manipulation with its nodes).
That work fine if the XSLT has an <xsl:output method="xml" encoding="UTF-8"/> instruction.
The C++ code can be compiled in two versions - single byte encoding (windows-1252) and Unicode. For the single byte case the XSLT has <xsl:output method="xml" encoding="windows-1252"/>
If the XML or XSLT has a not ASCII (greater than 127) character, the transform() method fails. It either returns E_FAIL or the output document remains empty.
Is it possible to setup the output document to accept a specific single-byte encoding (windows-1252)?

ste5an

It's hard to tell without code.. and it's not clear, what your use-case is.

When I need to guess, it has do with the encoding detection from the input. So check the input file for its encoding, whether it has a BOM or not and whether the XML itself declares an encoding and whether it is the correct one.

Thus take your existing input file, convert it to UTF-8, add BOM and the encoding attribute. Test it.
Do the same for your code page. Convert the file, add the appropriate encoding attribute and save it using that encoding. Test it.

E.g. use Notepad++ for manipulating the encoding of the file and storing it accordingly.

zc2

ASKER

all input files are single byte, no UTF-8, BOMs, etc.
XML:

<inset include="8y46bc"/>

Open in new window

XSL:

<?xml version="1.0" encoding="windows-1252"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" encoding="windows-1252" omit-xml-declaration="yes"/>

<xsl:template match="/inset">
    <!-- template body -->
</xsl:template>
</xsl:stylesheet>

Open in new window

Eduard Ghergu

Hi,

<inset include="8y46bc"/>

This is all the content of the xml file?

ste5an

When I need to guess, it has do with the encoding detection from the input

<inset include="8y46bc"/>

Open in new window

When you don't specify an encoding in the file as encoding attribute, then a single byte file is undistinguishable from a UTF-8. You need to test this, as I already wrote:

Change your input file to:

<?xml version="1.0" encoding="windows-1252"?>
<inset include="8y46bc"/>

Open in new window

zc2

ASKER

This is all the content of the xml file?

In the test sample I am working with - yes. In production there would be data.

You need to test this, as I already wrote:

I will try that.
I think, that the problem is not with the input but with the output. Since the XSL has the omit-xml-declaration="yes" attribute, the output does not have any declaration, and UTF-8 is assumed.

ste5an

Well, what about posing a concise and complete example? When it's VS, then attaching such a project would help. Include your test harness.

Eduard Ghergu

Hi,

Please, check this: https://code-examples.net/en/q/282df

zc2

ASKER

Please, check this

I don't see how is that relevant.
My question is simple - is it possible to tell MSXML Document object to expect a single byte input to load, not UTF8 (except putting the XML declaration)?

zc2

ASKER

ste5an,
As I expected, adding the XML declaration to the input XML file does not change a thing.
Only removing the omit-xml-declaration="yes" attribute from XSLT makes the output DOMdocument load it correctly.
But the problem is that I don't want the XML declaration in the output.

Eduard Ghergu

Hi,

The load method will do just what its name says. What you can do is to modify the XML document properties to fit to your scenario.

zc2

ASKER

modify the XML document properties

That's exactly my question. How do I do that?

ASKER CERTIFIED SOLUTION

zc2

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial