xerces transform throws "Character conversion error"

hello i run a query on SQLServer where the output gets returned as XML (with the 'AS XML RAW' directive )
one of the characters in the output is a ¥ (japanese jen sign) and it throws a

"Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low)."

exception when i try to transfor the XML with a stylesheet (using xerces)  .
the stylesheet specifies:
<xsl:output method="xml" version="1.0" encoding="UTF-16" indent="yes"/>
i set the
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

so i don't know why it the exception even mentions UTF-8.

any help appreciated.

-------------- code snippet ----------------

            ResultSet rs = cs.getResultSet();
            // Check if no results returned:
            if (rs == null)

            FileOutputStream fileOutput = new FileOutputStream("c:\\tomOut.xml");

            StringBuffer aSB = new StringBuffer("<root>");  // add the root element
            while( rs.next())
                String line = rs.getString(1); // write the xml document
            aSB.append("</root>"); // close the root element
             // transform
            TransformerFactory factory = TransformerFactory.newInstance();
            StreamSource xslSource = new StreamSource("c:\\test.xsl");
            javax.xml.transform.Templates stylesheet = factory.newTemplates(xslSource);
            Transformer transformer = stylesheet.newTransformer();
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

            // The source is a file.
            StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
            // The result is a file.
            StreamResult result = new StreamResult("c:\\converted.xml");

            transformer.transform(source, result);  // it throws the exception here

-------------- code snippet ----------------
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Tommy BraasCommented:
Hi tomschuring,

Why are you setting the decoding to UTF-16? What is the format of the field in the database from which the XML is retrieved? Is the database set up to run in UTF-16 or ISO?

use UNICODE big endian instead of UTF while writting to output.

Naeem Shehzad Ghuman
tomschuringAuthor Commented:
i was setting the decoding to UTF-16 to make sure it outputs UTF-16 so it wouldn't get confused with the yen character. i tried without first and then added the
 transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16")

the the result of the query is retrieved with the sql server directive "FOR XML RAW" so (i assume sqlserver) must create an internal temporary table to store the XML document in. good question though. i see if i can find what datatype that comes back as.

when i do the query manually with query analyzer i just see multiple records like

<row UNIQUEID="53380601" a="53"  b="$2543" /> (.... heaps more of these tags ...)<row UNIQUEID="5311111"

a="125" b="$2344" /> <row UNIQUEID="5660101" a="165" b="¥748876" />

so it breaks off the xml after the size of the field in the temporary table
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

tomschuringAuthor Commented:
"use UNICODE big endian while writing the output"

do you mean when i write to the StringBuffer ?
tomschuringAuthor Commented:
i'm not sure how to find out  if the database is setup to run in UTF-16 or ISO
is there a way to check ?

>> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

transformer.setOutputProperty(OutputKeys.ENCODING, "UNICODE");
tomschuringAuthor Commented:
mmmm that gave me the same exception:
Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low).
tomschuringAuthor Commented:
when i write the StringBuffer ( aSB ) to file, open it up with XMLSpy and apply the c:\test.xsl stylesheet it converts it.... very frustrating.
tomschuringAuthor Commented:
when i write the StringBuffer to file ( c:\XMLTempOut.xml ) and read it in like :

transformer.transform(new StreamSource("c:\\XMLTempOut.xml"), result);

it works without exception......

is there a way to force
  StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
to be UTF-16 ?

Tommy BraasCommented:

I don't think that Naeemg understands the difference between Unicode and UTF-X.

UTF-16 is a two byte per glyph encoding of Unicode glyphs (one character == one glyph), UTF-8 is a multibyte encoding of Unicode glyphs. UTF-8 represents each Unicode glyph as 1, 2, 3 or 4 bytes, and is more compact than the other encodings.

Furthermore, when encoding to more than one byte per glyph byte order comes into play as well. There are two byte orderings, big endian and little endian.

Copied from the SQL Server Books Online:
"Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. Use nvarchar when a column's entries vary in the number of Unicode characters (up to 4,000) they contain. Use nchar when every entry for a column has the same fixed length (up to 4,000 Unicode characters). Use ntext when any entry for a column is longer than 4,000 Unicode characters."

Check what the datatype of the field storing the XML is. If it is one of the above your fine, if not, that is the problem.

Are you accessing the data using the SQL Server JDBC type 4 driver, or the JDBC-ODBC bridge? You can tell from which driver is loaded in the class which accesses the XML.

tomschuringAuthor Commented:
hello orangehead911,

i'm using the jtds JDBC driver which is a type 4 driver. ( http://sourceforge.net/projects/jtds/ )

i don't think i can find out what type the xml is returned in. it is obviously multi-byte in some form because it contains the yen character.

when i write the output to file (in UTF-16) format and use that file to do the xslt-transformation no exception is thrown (and no data is lost)
but it looks that the
       StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
does not convert it to UTF-16

Tommy BraasCommented:

Why aren't you using M$ SQL Server JDBC driver? It's free to download from their site.

Anyway, the mere presence of a yen sign does _not_ indicate that the encoding used is a UTF-X encoding, rather it indicates to me that the encoding is actually Latin 1, which has the yen sign. Try to specify a different encoding such as ISO-8559-1 instead.

Try using a Reader in the StreamSource:

StreamSource source = new StreamSource(new StringReader(aSB.toString()));

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
tomschuringAuthor Commented:
hello orangehead911,

i'm not using the M$ driver because the jtds is heaps and heaps faster.
thank you for thinking along though.

tomschuringAuthor Commented:

thanks, that did the trick.. it doesn't throw an exception anymore and it performs the transformation..
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Fonts Typography

From novice to tech pro — start learning today.