Solved

xerces transform throws "Character conversion error"

Posted on 2004-10-19
16
618 Views
Last Modified: 2013-12-03
hello i run a query on SQLServer where the output gets returned as XML (with the 'AS XML RAW' directive )
one of the characters in the output is a ¥ (japanese jen sign) and it throws a

"Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low)."

exception when i try to transfor the XML with a stylesheet (using xerces)  .
the stylesheet specifies:
<xsl:output method="xml" version="1.0" encoding="UTF-16" indent="yes"/>
i set the
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

so i don't know why it the exception even mentions UTF-8.

any help appreciated.
thanks,
tom


-------------- code snippet ----------------

        .....
            cs.execute();
            ResultSet rs = cs.getResultSet();
           
            // Check if no results returned:
            if (rs == null)
                System.out.println("Error");

            FileOutputStream fileOutput = new FileOutputStream("c:\\tomOut.xml");

            StringBuffer aSB = new StringBuffer("<root>");  // add the root element
            while( rs.next())
            {
                String line = rs.getString(1); // write the xml document
                aSB.append(line);
            }
            aSB.append("</root>"); // close the root element
           
             // transform
            TransformerFactory factory = TransformerFactory.newInstance();
            StreamSource xslSource = new StreamSource("c:\\test.xsl");
            javax.xml.transform.Templates stylesheet = factory.newTemplates(xslSource);
           
            Transformer transformer = stylesheet.newTransformer();
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

            // The source is a file.
            StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
           
            // The result is a file.
            StreamResult result = new StreamResult("c:\\converted.xml");

            transformer.transform(source, result);  // it throws the exception here

-------------- code snippet ----------------
0
Comment
Question by:tomschuring
  • 9
  • 3
  • 2
  • +1
16 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
Comment Utility
Hi tomschuring,

Why are you setting the decoding to UTF-16? What is the format of the field in the database from which the XML is retrieved? Is the database set up to run in UTF-16 or ISO?

\t
0
 
LVL 5

Expert Comment

by:Naeemg
Comment Utility
use UNICODE big endian instead of UTF while writting to output.


Naeem Shehzad Ghuman
0
 

Author Comment

by:tomschuring
Comment Utility
i was setting the decoding to UTF-16 to make sure it outputs UTF-16 so it wouldn't get confused with the yen character. i tried without first and then added the
 transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16")

the the result of the query is retrieved with the sql server directive "FOR XML RAW" so (i assume sqlserver) must create an internal temporary table to store the XML document in. good question though. i see if i can find what datatype that comes back as.

when i do the query manually with query analyzer i just see multiple records like

record1:
<row UNIQUEID="53380601" a="53"  b="$2543" /> (.... heaps more of these tags ...)<row UNIQUEID="5311111"

record2:
a="125" b="$2344" /> <row UNIQUEID="5660101" a="165" b="¥748876" />

so it breaks off the xml after the size of the field in the temporary table
0
 

Author Comment

by:tomschuring
Comment Utility
"use UNICODE big endian while writing the output"

do you mean when i write to the StringBuffer ?
0
 

Author Comment

by:tomschuring
Comment Utility
i'm not sure how to find out  if the database is setup to run in UTF-16 or ISO
is there a way to check ?
0
 
LVL 5

Expert Comment

by:Naeemg
Comment Utility

>> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

transformer.setOutputProperty(OutputKeys.ENCODING, "UNICODE");
0
 

Author Comment

by:tomschuring
Comment Utility
mmmm that gave me the same exception:
Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low).
0
 

Author Comment

by:tomschuring
Comment Utility
when i write the StringBuffer ( aSB ) to file, open it up with XMLSpy and apply the c:\test.xsl stylesheet it converts it.... very frustrating.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:tomschuring
Comment Utility
when i write the StringBuffer to file ( c:\XMLTempOut.xml ) and read it in like :

transformer.transform(new StreamSource("c:\\XMLTempOut.xml"), result);

it works without exception......

is there a way to force
  StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
to be UTF-16 ?

0
 
LVL 14

Expert Comment

by:Tommy Braas
Comment Utility
tomschuring,

I don't think that Naeemg understands the difference between Unicode and UTF-X.

UTF-16 is a two byte per glyph encoding of Unicode glyphs (one character == one glyph), UTF-8 is a multibyte encoding of Unicode glyphs. UTF-8 represents each Unicode glyph as 1, 2, 3 or 4 bytes, and is more compact than the other encodings.

Furthermore, when encoding to more than one byte per glyph byte order comes into play as well. There are two byte orderings, big endian and little endian.

Copied from the SQL Server Books Online:
"Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. Use nvarchar when a column's entries vary in the number of Unicode characters (up to 4,000) they contain. Use nchar when every entry for a column has the same fixed length (up to 4,000 Unicode characters). Use ntext when any entry for a column is longer than 4,000 Unicode characters."

Check what the datatype of the field storing the XML is. If it is one of the above your fine, if not, that is the problem.

Are you accessing the data using the SQL Server JDBC type 4 driver, or the JDBC-ODBC bridge? You can tell from which driver is loaded in the class which accesses the XML.

\t
0
 

Author Comment

by:tomschuring
Comment Utility
hello orangehead911,

i'm using the jtds JDBC driver which is a type 4 driver. ( http://sourceforge.net/projects/jtds/ )

i don't think i can find out what type the xml is returned in. it is obviously multi-byte in some form because it contains the yen character.

when i write the output to file (in UTF-16) format and use that file to do the xslt-transformation no exception is thrown (and no data is lost)
but it looks that the
       StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
does not convert it to UTF-16

0
 
LVL 14

Expert Comment

by:Tommy Braas
Comment Utility
tomschuring,

Why aren't you using M$ SQL Server JDBC driver? It's free to download from their site.

Anyway, the mere presence of a yen sign does _not_ indicate that the encoding used is a UTF-X encoding, rather it indicates to me that the encoding is actually Latin 1, which has the yen sign. Try to specify a different encoding such as ISO-8559-1 instead.

\t
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
Comment Utility
Try using a Reader in the StreamSource:

StreamSource source = new StreamSource(new StringReader(aSB.toString()));
0
 

Author Comment

by:tomschuring
Comment Utility
hello orangehead911,

i'm not using the M$ driver because the jtds is heaps and heaps faster.
thank you for thinking along though.

0
 

Author Comment

by:tomschuring
Comment Utility
CEHJ,

thanks, that did the trick.. it doesn't throw an exception anymore and it performs the transformation..
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
:-)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
How Complex Is This Java Course ? 9 62
word0 challenge 3 55
Starting to learn JAVA, 7 46
JList custom Cell Renderer refresh 15 39
Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now