Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

xerces transform throws "Character conversion error"

Posted on 2004-10-19
16
622 Views
Last Modified: 2013-12-03
hello i run a query on SQLServer where the output gets returned as XML (with the 'AS XML RAW' directive )
one of the characters in the output is a ¥ (japanese jen sign) and it throws a

"Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low)."

exception when i try to transfor the XML with a stylesheet (using xerces)  .
the stylesheet specifies:
<xsl:output method="xml" version="1.0" encoding="UTF-16" indent="yes"/>
i set the
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

so i don't know why it the exception even mentions UTF-8.

any help appreciated.
thanks,
tom


-------------- code snippet ----------------

        .....
            cs.execute();
            ResultSet rs = cs.getResultSet();
           
            // Check if no results returned:
            if (rs == null)
                System.out.println("Error");

            FileOutputStream fileOutput = new FileOutputStream("c:\\tomOut.xml");

            StringBuffer aSB = new StringBuffer("<root>");  // add the root element
            while( rs.next())
            {
                String line = rs.getString(1); // write the xml document
                aSB.append(line);
            }
            aSB.append("</root>"); // close the root element
           
             // transform
            TransformerFactory factory = TransformerFactory.newInstance();
            StreamSource xslSource = new StreamSource("c:\\test.xsl");
            javax.xml.transform.Templates stylesheet = factory.newTemplates(xslSource);
           
            Transformer transformer = stylesheet.newTransformer();
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

            // The source is a file.
            StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
           
            // The result is a file.
            StreamResult result = new StreamResult("c:\\converted.xml");

            transformer.transform(source, result);  // it throws the exception here

-------------- code snippet ----------------
0
Comment
Question by:tomschuring
  • 9
  • 3
  • 2
  • +1
16 Comments
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 12355222
Hi tomschuring,

Why are you setting the decoding to UTF-16? What is the format of the field in the database from which the XML is retrieved? Is the database set up to run in UTF-16 or ISO?

\t
0
 
LVL 5

Expert Comment

by:Naeemg
ID: 12355232
use UNICODE big endian instead of UTF while writting to output.


Naeem Shehzad Ghuman
0
 

Author Comment

by:tomschuring
ID: 12355254
i was setting the decoding to UTF-16 to make sure it outputs UTF-16 so it wouldn't get confused with the yen character. i tried without first and then added the
 transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16")

the the result of the query is retrieved with the sql server directive "FOR XML RAW" so (i assume sqlserver) must create an internal temporary table to store the XML document in. good question though. i see if i can find what datatype that comes back as.

when i do the query manually with query analyzer i just see multiple records like

record1:
<row UNIQUEID="53380601" a="53"  b="$2543" /> (.... heaps more of these tags ...)<row UNIQUEID="5311111"

record2:
a="125" b="$2344" /> <row UNIQUEID="5660101" a="165" b="¥748876" />

so it breaks off the xml after the size of the field in the temporary table
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:tomschuring
ID: 12355260
"use UNICODE big endian while writing the output"

do you mean when i write to the StringBuffer ?
0
 

Author Comment

by:tomschuring
ID: 12355274
i'm not sure how to find out  if the database is setup to run in UTF-16 or ISO
is there a way to check ?
0
 
LVL 5

Expert Comment

by:Naeemg
ID: 12355295

>> transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

transformer.setOutputProperty(OutputKeys.ENCODING, "UNICODE");
0
 

Author Comment

by:tomschuring
ID: 12355314
mmmm that gave me the same exception:
Character conversion error: "Unconvertible UTF-8 character beginning with 0xa5" (line number may be too low).
0
 

Author Comment

by:tomschuring
ID: 12355326
when i write the StringBuffer ( aSB ) to file, open it up with XMLSpy and apply the c:\test.xsl stylesheet it converts it.... very frustrating.
0
 

Author Comment

by:tomschuring
ID: 12355369
when i write the StringBuffer to file ( c:\XMLTempOut.xml ) and read it in like :

transformer.transform(new StreamSource("c:\\XMLTempOut.xml"), result);

it works without exception......

is there a way to force
  StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
to be UTF-16 ?

0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 12355433
tomschuring,

I don't think that Naeemg understands the difference between Unicode and UTF-X.

UTF-16 is a two byte per glyph encoding of Unicode glyphs (one character == one glyph), UTF-8 is a multibyte encoding of Unicode glyphs. UTF-8 represents each Unicode glyph as 1, 2, 3 or 4 bytes, and is more compact than the other encodings.

Furthermore, when encoding to more than one byte per glyph byte order comes into play as well. There are two byte orderings, big endian and little endian.

Copied from the SQL Server Books Online:
"Unicode data is stored using the nchar, nvarchar, and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. Use nvarchar when a column's entries vary in the number of Unicode characters (up to 4,000) they contain. Use nchar when every entry for a column has the same fixed length (up to 4,000 Unicode characters). Use ntext when any entry for a column is longer than 4,000 Unicode characters."

Check what the datatype of the field storing the XML is. If it is one of the above your fine, if not, that is the problem.

Are you accessing the data using the SQL Server JDBC type 4 driver, or the JDBC-ODBC bridge? You can tell from which driver is loaded in the class which accesses the XML.

\t
0
 

Author Comment

by:tomschuring
ID: 12355490
hello orangehead911,

i'm using the jtds JDBC driver which is a type 4 driver. ( http://sourceforge.net/projects/jtds/ )

i don't think i can find out what type the xml is returned in. it is obviously multi-byte in some form because it contains the yen character.

when i write the output to file (in UTF-16) format and use that file to do the xslt-transformation no exception is thrown (and no data is lost)
but it looks that the
       StreamSource source = new StreamSource(new StringBufferInputStream(aSB.toString()));
does not convert it to UTF-16

0
 
LVL 14

Expert Comment

by:Tommy Braas
ID: 12355526
tomschuring,

Why aren't you using M$ SQL Server JDBC driver? It's free to download from their site.

Anyway, the mere presence of a yen sign does _not_ indicate that the encoding used is a UTF-X encoding, rather it indicates to me that the encoding is actually Latin 1, which has the yen sign. Try to specify a different encoding such as ISO-8559-1 instead.

\t
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 12355852
Try using a Reader in the StreamSource:

StreamSource source = new StreamSource(new StringReader(aSB.toString()));
0
 

Author Comment

by:tomschuring
ID: 12364151
hello orangehead911,

i'm not using the M$ driver because the jtds is heaps and heaps faster.
thank you for thinking along though.

0
 

Author Comment

by:tomschuring
ID: 12364178
CEHJ,

thanks, that did the trick.. it doesn't throw an exception anymore and it performs the transformation..
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12379952
:-)
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
servlet example issue 6 46
Can I print examples of all of my installed fonts? 6 28
Java syntax, or is it Selenium 6 30
Java program running SQL query 5 37
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
This video teaches viewers about errors in exception handling.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question