Main Topics
Browse All TopicsI'm making the same request to two different webservers: one is made directly to the JRun webserver (HTTP1.0) and the other through Apache(HTTP1.1).
The xml returned is the same and contains some extended-ascii chars like à,ò,ù....
The first request is parsed by SAX without problems, the second one generates this exception
"Unconvertible UTF-8 character beginning with 0xf9"
and clearly is related to the extended-ascii char "ù". Since the returned xml is identical, the problem should be in the http headers. Note that the working xml is sent over HTTP1.0 and the wrong one over HTTP1.1. Can this make the difference?
Why sax assumes that the xml is UTF-8 encoded while clearly specified that is ISO-8859-1 encoded?
Here the headers of the both requests:
HTTP/1.1 200 OK
Date: Thu, 08 Jun 2006 15:01:09 GMT
Server: Apache/2.0.54 (Unix) JRun/4.0
Connection: close
Content-Type: text/html; charset=ISO-8859-1
<?xml version="1.0" encoding="iso-8859-1"?>
HTTP/1.0 200 OK
Date: Thu, 08 Jun 2006 15:00:45 GMT
Content-Type: text/html; charset=ISO-8859-1
Server: JRun Web Server
<?xml version="1.0" encoding="iso-8859-1"?>
Any ideas?
Thank you
Andrea
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Here an extract of the code used to parse the xml...
SAXParser parser = getSAXParser();
URL url = new URL(u);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
InputStream is = conn.getInputStream();
parser.parse(is, handler);
I noted that if i use an InputStreamReader the method getEncoding() returns "ASCII", but i don't know if it's related or not with my problem...
Ok thank you all for your effort, but i think I found the solution.
The method getInputStream() seems to ignore the encoding when using HTTP1.1.
I'm using this code as a workaround (it preserves the encoding)
InputStream in = conn.getInputStream();
byte[] tmp = new byte[512];
ByteArrayOutputStream out = new ByteArrayOutputStream();
int bytesRead = in.read(tmp);
while (bytesRead != -1) {
out.write(tmp, 0, bytesRead);
bytesRead = in.read(tmp);
}
in.close();
byte[] data = out.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(data)
parser.parse(bais, handler);
Comments and suggestion welcomed.
Andrea
Business Accounts
Answer for Membership
by: bazarnyPosted on 2006-06-09 at 02:42:52ID: 16868668
Hmm, could it be that you parse results in different ways? What's the client which parses responce?