?
Solved

xerces transform throws "Character conversion error"

Posted on 2004-10-19
16
Medium Priority
?
625 Views
Last Modified: 2013-12-03
hello i run a query on SQLServer where the output gets returned as XML (with the 'AS XML RAW' directive )
one of the characters in the output is a ¥ (japanese jen sign) and it throws a

"Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low)."

exception when i try to transfor the XML with a stylesheet (using xerces)  .
the stylesheet specifies:
<xsl:output method="xml" version="1.0" encoding="UTF-16" indent="yes"/>
i set the
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

so i don't know why it the exception even mentions UTF-8.

any help appreciated.
thanks,
tom


-------------- code snippet ----------------

        .....
            cs.execute();
            ResultSet rs = cs.getResultSet();
           
            // Check if no results returned:
            if (rs == null)
                System.out.println("Error");

            FileOutputStream fileOutput = new FileOutputStream("c:\\tomOut.xml");

            StringBuffer aSB = new StringBuffer("<root>");  // add the root element
            while( rs.next())
            {
                String line = rs.getString(1); // write the xml document
                aSB.append(line);
            }
            aSB.append("</root>"); // close the root element
           
             // transform
            TransformerFactory factory = TransformerFactory.newInstance();
            StreamSource xslSource = new StreamSource("c:\\test.xsl");
            javax.xml.transform.Templates stylesheet = factory.newTemplates(xslSource);
           
            Transformer transformer = stylesheet.newTransformer();
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

            // The source is a file.
            StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
           
            // The result is a file.
            StreamResult result = new StreamResult("c:\\converted.xml");

            transformer.transform(source, result);  // it throws the exception here

-------------- code snippet ----------------
0
Comment
Question by:tomschuring
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 3
  • 2
  • +1
16 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 12355222
Hi tomschuring,

Why are you setting the decoding to UTF-16? What is the format of the field in the database from which the XML is retrieved? Is the database set up to run in UTF-16 or ISO?

\t
0
 
LVL 5

Expert Comment

by:Naeemg
ID: 12355232
use UNICODE big endian instead of UTF while writting to output.


Naeem Shehzad Ghuman
0
 

Author Comment

by:tomschuring
ID: 12355254
i was setting the decoding to UTF-16 to make sure it outputs UTF-16 so it wouldn't get confused with the yen character. i tried without first and then added the
 transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16")

the the result of the query is retrieved with the sql server directive "FOR XML RAW" so (i assume sqlserver) must create an internal temporary table to store the XML document in. good question though. i see if i can find what datatype that comes back as.

when i do the query manually with query analyzer i just see multiple records like

record1:
<row UNIQUEID="53380601" a="53"  b="$2543" /> (.... heaps more of these tags ...)<row UNIQUEID="5311111"

record2:
a="125" b="$2344" /> <row UNIQUEID="5660101" a="165" b="¥748876" />

so it breaks off the xml after the size of the field in the temporary table
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 

Author Comment

by:tomschuring
ID: 12355260
"use UNICODE big endian while writing the output"

do you mean when i write to the StringBuffer ?
0
 

Author Comment

by:tomschuring
ID: 12355274
i'm not sure how to find out  if the database is setup to run in UTF-16 or ISO
is there a way to check ?
0
 
LVL 5

Expert Comment

by:Naeemg
ID: 12355295

>> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

transformer.setOutputProperty(OutputKeys.ENCODING, "UNICODE");
0
 

Author Comment

by:tomschuring
ID: 12355314
mmmm that gave me the same exception:
Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low).
0
 

Author Comment

by:tomschuring
ID: 12355326
when i write the StringBuffer ( aSB ) to file, open it up with XMLSpy and apply the c:\test.xsl stylesheet it converts it.... very frustrating.
0
 

Author Comment

by:tomschuring
ID: 12355369
when i write the StringBuffer to file ( c:\XMLTempOut.xml ) and read it in like :

transformer.transform(new StreamSource("c:\\XMLTempOut.xml"), result);

it works without exception......

is there a way to force
  StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
to be UTF-16 ?

0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 12355433
tomschuring,

I don't think that Naeemg understands the difference between Unicode and UTF-X.

UTF-16 is a two byte per glyph encoding of Unicode glyphs (one character == one glyph), UTF-8 is a multibyte encoding of Unicode glyphs. UTF-8 represents each Unicode glyph as 1, 2, 3 or 4 bytes, and is more compact than the other encodings.

Furthermore, when encoding to more than one byte per glyph byte order comes into play as well. There are two byte orderings, big endian and little endian.

Copied from the SQL Server Books Online:
"Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. Use nvarchar when a column's entries vary in the number of Unicode characters (up to 4,000) they contain. Use nchar when every entry for a column has the same fixed length (up to 4,000 Unicode characters). Use ntext when any entry for a column is longer than 4,000 Unicode characters."

Check what the datatype of the field storing the XML is. If it is one of the above your fine, if not, that is the problem.

Are you accessing the data using the SQL Server JDBC type 4 driver, or the JDBC-ODBC bridge? You can tell from which driver is loaded in the class which accesses the XML.

\t
0
 

Author Comment

by:tomschuring
ID: 12355490
hello orangehead911,

i'm using the jtds JDBC driver which is a type 4 driver. ( http://sourceforge.net/projects/jtds/ )

i don't think i can find out what type the xml is returned in. it is obviously multi-byte in some form because it contains the yen character.

when i write the output to file (in UTF-16) format and use that file to do the xslt-transformation no exception is thrown (and no data is lost)
but it looks that the
       StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
does not convert it to UTF-16

0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 12355526
tomschuring,

Why aren't you using M$ SQL Server JDBC driver? It's free to download from their site.

Anyway, the mere presence of a yen sign does _not_ indicate that the encoding used is a UTF-X encoding, rather it indicates to me that the encoding is actually Latin 1, which has the yen sign. Try to specify a different encoding such as ISO-8559-1 instead.

\t
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 2000 total points
ID: 12355852
Try using a Reader in the StreamSource:

StreamSource source = new StreamSource(new StringReader(aSB.toString()));
0
 

Author Comment

by:tomschuring
ID: 12364151
hello orangehead911,

i'm not using the M$ driver because the jtds is heaps and heaps faster.
thank you for thinking along though.

0
 

Author Comment

by:tomschuring
ID: 12364178
CEHJ,

thanks, that did the trick.. it doesn't throw an exception anymore and it performs the transformation..
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12379952
:-)
0

Featured Post

Want to be a Web Developer? Get Certified Today!

Enroll in the Certified Web Development Professional course package to learn HTML, Javascript, and PHP. Build a solid foundation to work toward your dream job!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
Suggested Courses
Course of the Month13 days, 21 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question