Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 430
  • Last Modified:

SaxParser exception and Apache

I'm making the same request to two different webservers: one is made directly to the JRun webserver (HTTP1.0) and the other through Apache(HTTP1.1).
The xml returned is the same and contains some extended-ascii chars like à,ò,ù....
The first request is parsed by SAX without problems, the second one generates this exception

"Unconvertible UTF-8 character beginning with 0xf9"

and clearly is related to the extended-ascii char "ù". Since the returned xml is identical, the problem should be in the http headers. Note that the working xml is sent over HTTP1.0 and the wrong one over HTTP1.1. Can this make the difference?
Why sax assumes that the xml is UTF-8 encoded while clearly specified that is ISO-8859-1 encoded?

Here the headers of the both requests:

HTTP/1.1 200 OK
Date: Thu, 08 Jun 2006 15:01:09 GMT
Server: Apache/2.0.54 (Unix) JRun/4.0
Connection: close
Content-Type: text/html; charset=ISO-8859-1

<?xml version="1.0" encoding="iso-8859-1"?>


HTTP/1.0 200 OK
Date: Thu, 08 Jun 2006 15:00:45 GMT
Content-Type: text/html; charset=ISO-8859-1
Server: JRun Web Server

<?xml version="1.0" encoding="iso-8859-1"?>

Any ideas?
Thank you
Andrea

0
bugada
Asked:
bugada
  • 4
  • 4
  • 3
  • +3
1 Solution
 
Igor BazarnyCommented:
Hmm, could it be that you parse results in different ways? What's the client which parses responce?
0
 
bugadaAuthor Commented:
The client is the same... the only difference is the HTTP protocol version due to the Apache between Jrun Server and the client... if i request the resource directly to the Jrun Server, the xml is parsed correctly.

Can i force sax to parse an xml with a certain encoding (in this case ISO-8859-1)?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
bugadaAuthor Commented:
@hoomanv: that's not my case, beacause my XML is NOT UTY-8 encoded.
As you can see from charset=ISO-8859-1 and encoding="iso-8859-1" I want sax parse it using latin-western charset...
But it seems to use UTF-8 instead. Why?
0
 
Igor BazarnyCommented:
How do you create InputSource for your parser? Could encoding be messed at this point?
0
 
Igor BazarnyCommented:
Try to use InputSource(Reader) constructor and pass desired encoding to reader constructor
0
 
YagantappaCommented:
Verify your Apache configuration file (httpd.conf). Apache might be converting your xml file.
0
 
hoomanvCommented:
should Content-Type be text/xml or application/xml ?
0
 
YagantappaCommented:
It should be application/xml for xml files
0
 
Mayank SAssociate Director - Product EngineeringCommented:
>> But it seems to use UTF-8 instead. Why?

Can we see some of the parsing code?
0
 
bugadaAuthor Commented:
Here an extract of the code used to parse the xml...

SAXParser parser = getSAXParser();
URL url = new URL(u);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
InputStream is = conn.getInputStream();
parser.parse(is, handler);

I noted that if i use an InputStreamReader the method getEncoding() returns "ASCII", but i don't know if it's related or not with my problem...
0
 
Mayank SAssociate Director - Product EngineeringCommented:
I think that's because its the default used by the platform. Perhaps that's the problem with JRun too....

Can you try using the setProperty () method of SAXParser to set the encoding explicity to what you want, before you call parse ()? I'm not sure if it is possible but might be worth exploring.
0
 
bugadaAuthor Commented:
Ok thank you all for your effort, but i think I found the solution.

The method getInputStream() seems to ignore the encoding when using HTTP1.1.

I'm using this code as a workaround (it preserves the encoding)

InputStream in = conn.getInputStream();
byte[] tmp = new byte[512];

ByteArrayOutputStream out = new ByteArrayOutputStream();
int bytesRead = in.read(tmp);

while (bytesRead != -1) {
      out.write(tmp, 0, bytesRead);
      bytesRead = in.read(tmp);
}
in.close();

byte[] data = out.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(data);
parser.parse(bais, handler);

Comments and suggestion welcomed.
Andrea

0
 
Mayank SAssociate Director - Product EngineeringCommented:
You can convert a byte-array output stream to a String using:

String contents = out.toString ( encoding ) ; // here, you can specify the encoding

Then you can use that as your initial contents for parsing.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Its ok, but make sure you specify the encoding as I'd suggested.
0
 
NetminderCommented:
Closed, 250 points refunded.
Netminder
Site Admin
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 4
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now