Solved

Java xml parse error: Special characters

Posted on 2008-11-02
3
2,190 Views
Last Modified: 2015-01-05
I have this issue lingering for a while. I have an xml  which has special characters and I am trying to parse them and have serious problem. Experts please advice.

Here is the xml

 <?xml version="1.0" encoding="UTF-8"?>
<user_data>
    <time_taken>ÀÀÀÀ</time_taken>  ///SPECIAL CHARACTERS.
</user_data>

Here is my servlet which parses:

protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception
      {
            request.setCharacterEncoding("UTF-8");

            int contentLength = request.getContentLength();
            if ( contentLength == -1 ) {
                  // Content length must be known.
                  throw new ServletException( "Content-Length must be specified" );
            }
            String contentType = request.getContentType();
            System.out.println("request.getContentType(): " +request.getContentType() );
            System.out.println("request.getContentLength(): " +request.getContentLength() );

            boolean contentTypeIsOkay = false;
            // Content-Type must be specified.
            if ( contentType != null ) {
                  // The type must be plain text.
                  if ( contentType.startsWith( "text/xml" ) ) {
                        // And it must be UTF-8 encoded (or unspecified, in which case
                        // we assume
                        // that it's either UTF-8 or ASCII).
                        if ( contentType.indexOf( "charset=" ) == -1 ) {
                              contentTypeIsOkay = true;
                        } else if ( contentType.indexOf( "charset=utf-8" ) != -1 ) {
                              contentTypeIsOkay = true;
                        }
                  }
            }
            if ( !contentTypeIsOkay ) {
                  throw new ServletException(
                  "Content-Type must be 'text/xml' with 'charset=utf-8' (or unspecified charset)" );
            }
            InputStream in = request.getInputStream();
            //      InputStreamReader in = new InputStreamReader(request.getInputStream(), "UTF-8");
            String decoded = null;
            String pay = null;
            try {
                  byte[] payload = new byte[contentLength];
                  int offset = 0;
                  int len = contentLength;
                  int byteCount;
                  while ( offset < contentLength ) {
                        byteCount = in.read( payload, offset, len );
                        if ( byteCount == -1 ) {
                              throw new ServletException( "Client did not send " + contentLength + " bytes as expected" );
                        }
                        offset += byteCount;
                        len -= byteCount;
                  }
                  pay = new String( payload, "UTF-8" );
                  System.out.println("xml is : " +pay );

                  decoded = URLDecoder.decode(pay, "utf-8");
                  System.out.println("decoded : " +decoded );

            } finally {
                  if ( in != null ) {
                        in.close();
                  }
            }

            sun.io.ByteToCharConverter fromUnicode;
            String convertedStr = decoded;
            try {
                  fromUnicode = sun.io.ByteToCharConverter.getConverter("UTF-8");
                  fromUnicode.setSubstitutionMode(true);

                  char[] convertedChars;

                  convertedChars = fromUnicode.convertAll(convertedStr.getBytes());

                  convertedStr = new String(convertedChars);
                  System.out.println("convertedStr : " +convertedStr );
                  
            } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
            }
            
            
            InputStream inputStream = request.getInputStream();

            System.out.println("request.getCharacterEncoding()  : " + request.getCharacterEncoding() );
            SAXBuilder builder = null;
        // Create an instance of the tester and test
        builder = new SAXBuilder();
       
     
        Document doc= builder.build(new java.io.ByteArrayInputStream(convertedStr.getBytes()));

//////ERROR : Illegal XML character:  &#x4;.
       
        Element user_data =doc.getRootElement();
   
0
Comment
Question by:istiaquem
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 22866278
See if these help:

Deals with character encoding to handle accented characters like you are using:
http://www.javazoom.net/services/newsletter/xmlgeneration.html

Dealing with unicode characters:
http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=34&t=003647

Escaping other characters reference:
http://www.javapractices.com/topic/TopicAction.do?Id=96

Hopefully that helps.  Think the first link will be what you are looking for and the others are for light reading.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 22866292
In case links aren't working -- the suggestion is to try character encoding ISO-8859-1.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question