Solved

SaxParser exception and Apache

Posted on 2006-06-09
18
398 Views
Last Modified: 2012-06-27
I'm making the same request to two different webservers: one is made directly to the JRun webserver (HTTP1.0) and the other through Apache(HTTP1.1).
The xml returned is the same and contains some extended-ascii chars like à,ò,ù....
The first request is parsed by SAX without problems, the second one generates this exception

"Unconvertible UTF-8 character beginning with 0xf9"

and clearly is related to the extended-ascii char "ù". Since the returned xml is identical, the problem should be in the http headers. Note that the working xml is sent over HTTP1.0 and the wrong one over HTTP1.1. Can this make the difference?
Why sax assumes that the xml is UTF-8 encoded while clearly specified that is ISO-8859-1 encoded?

Here the headers of the both requests:

HTTP/1.1 200 OK
Date: Thu, 08 Jun 2006 15:01:09 GMT
Server: Apache/2.0.54 (Unix) JRun/4.0
Connection: close
Content-Type: text/html; charset=ISO-8859-1

<?xml version="1.0" encoding="iso-8859-1"?>


HTTP/1.0 200 OK
Date: Thu, 08 Jun 2006 15:00:45 GMT
Content-Type: text/html; charset=ISO-8859-1
Server: JRun Web Server

<?xml version="1.0" encoding="iso-8859-1"?>

Any ideas?
Thank you
Andrea

0
Comment
Question by:bugada
  • 4
  • 4
  • 3
  • +3
18 Comments
 
LVL 7

Expert Comment

by:Igor Bazarny
ID: 16868668
Hmm, could it be that you parse results in different ways? What's the client which parses responce?
0
 
LVL 14

Expert Comment

by:hoomanv
ID: 16868756
0
 
LVL 10

Author Comment

by:bugada
ID: 16868798
The client is the same... the only difference is the HTTP protocol version due to the Apache between Jrun Server and the client... if i request the resource directly to the Jrun Server, the xml is parsed correctly.

Can i force sax to parse an xml with a certain encoding (in this case ISO-8859-1)?
0
 
LVL 10

Author Comment

by:bugada
ID: 16868863
@hoomanv: that's not my case, beacause my XML is NOT UTY-8 encoded.
As you can see from charset=ISO-8859-1 and encoding="iso-8859-1" I want sax parse it using latin-western charset...
But it seems to use UTF-8 instead. Why?
0
 
LVL 7

Expert Comment

by:Igor Bazarny
ID: 16868950
How do you create InputSource for your parser? Could encoding be messed at this point?
0
 
LVL 7

Expert Comment

by:Igor Bazarny
ID: 16868994
Try to use InputSource(Reader) constructor and pass desired encoding to reader constructor
0
 
LVL 4

Expert Comment

by:Yagantappa
ID: 16869135
Verify your Apache configuration file (httpd.conf). Apache might be converting your xml file.
0
 
LVL 14

Expert Comment

by:hoomanv
ID: 16869156
should Content-Type be text/xml or application/xml ?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 4

Expert Comment

by:Yagantappa
ID: 16869169
It should be application/xml for xml files
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 16869507
>> But it seems to use UTF-8 instead. Why?

Can we see some of the parsing code?
0
 
LVL 10

Author Comment

by:bugada
ID: 16869853
Here an extract of the code used to parse the xml...

SAXParser parser = getSAXParser();
URL url = new URL(u);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
InputStream is = conn.getInputStream();
parser.parse(is, handler);

I noted that if i use an InputStreamReader the method getEncoding() returns "ASCII", but i don't know if it's related or not with my problem...
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 16869900
I think that's because its the default used by the platform. Perhaps that's the problem with JRun too....

Can you try using the setProperty () method of SAXParser to set the encoding explicity to what you want, before you call parse ()? I'm not sure if it is possible but might be worth exploring.
0
 
LVL 10

Author Comment

by:bugada
ID: 16885038
Ok thank you all for your effort, but i think I found the solution.

The method getInputStream() seems to ignore the encoding when using HTTP1.1.

I'm using this code as a workaround (it preserves the encoding)

InputStream in = conn.getInputStream();
byte[] tmp = new byte[512];

ByteArrayOutputStream out = new ByteArrayOutputStream();
int bytesRead = in.read(tmp);

while (bytesRead != -1) {
      out.write(tmp, 0, bytesRead);
      bytesRead = in.read(tmp);
}
in.close();

byte[] data = out.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(data);
parser.parse(bais, handler);

Comments and suggestion welcomed.
Andrea

0
 
LVL 30

Expert Comment

by:Mayank S
ID: 16886034
You can convert a byte-array output stream to a String using:

String contents = out.toString ( encoding ) ; // here, you can specify the encoding

Then you can use that as your initial contents for parsing.
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 17079256
Its ok, but make sure you specify the encoding as I'd suggested.
0
 
LVL 5

Accepted Solution

by:
Netminder earned 0 total points
ID: 17105381
Closed, 250 points refunded.
Netminder
Site Admin
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
mapAB Challlenge 35 128
solarwind tftp server 2 45
egit plugin on eclipse 8 41
configure dependency in POM for new database 3 18
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now