[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 627
  • Last Modified:

xerces transform throws "Character conversion error"

hello i run a query on SQLServer where the output gets returned as XML (with the 'AS XML RAW' directive )
one of the characters in the output is a ¥ (japanese jen sign) and it throws a

"Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low)."

exception when i try to transfor the XML with a stylesheet (using xerces)  .
the stylesheet specifies:
<xsl:output method="xml" version="1.0" encoding="UTF-16" indent="yes"/>
i set the
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

so i don't know why it the exception even mentions UTF-8.

any help appreciated.
thanks,
tom


-------------- code snippet ----------------

        .....
            cs.execute();
            ResultSet rs = cs.getResultSet();
           
            // Check if no results returned:
            if (rs == null)
                System.out.println("Error");

            FileOutputStream fileOutput = new FileOutputStream("c:\\tomOut.xml");

            StringBuffer aSB = new StringBuffer("<root>");  // add the root element
            while( rs.next())
            {
                String line = rs.getString(1); // write the xml document
                aSB.append(line);
            }
            aSB.append("</root>"); // close the root element
           
             // transform
            TransformerFactory factory = TransformerFactory.newInstance();
            StreamSource xslSource = new StreamSource("c:\\test.xsl");
            javax.xml.transform.Templates stylesheet = factory.newTemplates(xslSource);
           
            Transformer transformer = stylesheet.newTransformer();
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

            // The source is a file.
            StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
           
            // The result is a file.
            StreamResult result = new StreamResult("c:\\converted.xml");

            transformer.transform(source, result);  // it throws the exception here

-------------- code snippet ----------------
0
tomschuring
Asked:
tomschuring
  • 9
  • 3
  • 2
  • +1
1 Solution
 
Tommy BraasCommented:
Hi tomschuring,

Why are you setting the decoding to UTF-16? What is the format of the field in the database from which the XML is retrieved? Is the database set up to run in UTF-16 or ISO?

\t
0
 
NaeemgCommented:
use UNICODE big endian instead of UTF while writting to output.


Naeem Shehzad Ghuman
0
 
tomschuringAuthor Commented:
i was setting the decoding to UTF-16 to make sure it outputs UTF-16 so it wouldn't get confused with the yen character. i tried without first and then added the
 transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16")

the the result of the query is retrieved with the sql server directive "FOR XML RAW" so (i assume sqlserver) must create an internal temporary table to store the XML document in. good question though. i see if i can find what datatype that comes back as.

when i do the query manually with query analyzer i just see multiple records like

record1:
<row UNIQUEID="53380601" a="53"  b="$2543" /> (.... heaps more of these tags ...)<row UNIQUEID="5311111"

record2:
a="125" b="$2344" /> <row UNIQUEID="5660101" a="165" b="¥748876" />

so it breaks off the xml after the size of the field in the temporary table
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
tomschuringAuthor Commented:
"use UNICODE big endian while writing the output"

do you mean when i write to the StringBuffer ?
0
 
tomschuringAuthor Commented:
i'm not sure how to find out  if the database is setup to run in UTF-16 or ISO
is there a way to check ?
0
 
NaeemgCommented:

>> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

transformer.setOutputProperty(OutputKeys.ENCODING, "UNICODE");
0
 
tomschuringAuthor Commented:
mmmm that gave me the same exception:
Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low).
0
 
tomschuringAuthor Commented:
when i write the StringBuffer ( aSB ) to file, open it up with XMLSpy and apply the c:\test.xsl stylesheet it converts it.... very frustrating.
0
 
tomschuringAuthor Commented:
when i write the StringBuffer to file ( c:\XMLTempOut.xml ) and read it in like :

transformer.transform(new StreamSource("c:\\XMLTempOut.xml"), result);

it works without exception......

is there a way to force
  StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
to be UTF-16 ?

0
 
Tommy BraasCommented:
tomschuring,

I don't think that Naeemg understands the difference between Unicode and UTF-X.

UTF-16 is a two byte per glyph encoding of Unicode glyphs (one character == one glyph), UTF-8 is a multibyte encoding of Unicode glyphs. UTF-8 represents each Unicode glyph as 1, 2, 3 or 4 bytes, and is more compact than the other encodings.

Furthermore, when encoding to more than one byte per glyph byte order comes into play as well. There are two byte orderings, big endian and little endian.

Copied from the SQL Server Books Online:
"Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. Use nvarchar when a column's entries vary in the number of Unicode characters (up to 4,000) they contain. Use nchar when every entry for a column has the same fixed length (up to 4,000 Unicode characters). Use ntext when any entry for a column is longer than 4,000 Unicode characters."

Check what the datatype of the field storing the XML is. If it is one of the above your fine, if not, that is the problem.

Are you accessing the data using the SQL Server JDBC type 4 driver, or the JDBC-ODBC bridge? You can tell from which driver is loaded in the class which accesses the XML.

\t
0
 
tomschuringAuthor Commented:
hello orangehead911,

i'm using the jtds JDBC driver which is a type 4 driver. ( http://sourceforge.net/projects/jtds/ )

i don't think i can find out what type the xml is returned in. it is obviously multi-byte in some form because it contains the yen character.

when i write the output to file (in UTF-16) format and use that file to do the xslt-transformation no exception is thrown (and no data is lost)
but it looks that the
       StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
does not convert it to UTF-16

0
 
Tommy BraasCommented:
tomschuring,

Why aren't you using M$ SQL Server JDBC driver? It's free to download from their site.

Anyway, the mere presence of a yen sign does _not_ indicate that the encoding used is a UTF-X encoding, rather it indicates to me that the encoding is actually Latin 1, which has the yen sign. Try to specify a different encoding such as ISO-8559-1 instead.

\t
0
 
CEHJCommented:
Try using a Reader in the StreamSource:

StreamSource source = new StreamSource(new StringReader(aSB.toString()));
0
 
tomschuringAuthor Commented:
hello orangehead911,

i'm not using the M$ driver because the jtds is heaps and heaps faster.
thank you for thinking along though.

0
 
tomschuringAuthor Commented:
CEHJ,

thanks, that did the trick.. it doesn't throw an exception anymore and it performs the transformation..
0
 
CEHJCommented:
:-)
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 9
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now