Solved

Java xml parse error: Special characters

Posted on 2008-11-02
3
2,192 Views
Last Modified: 2015-01-05
I have this issue lingering for a while. I have an xml  which has special characters and I am trying to parse them and have serious problem. Experts please advice.

Here is the xml

 <?xml version="1.0" encoding="UTF-8"?>
<user_data>
    <time_taken>ÀÀÀÀ</time_taken>  ///SPECIAL CHARACTERS.
</user_data>

Here is my servlet which parses:

protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception
      {
            request.setCharacterEncoding("UTF-8");

            int contentLength = request.getContentLength();
            if ( contentLength == -1 ) {
                  // Content length must be known.
                  throw new ServletException( "Content-Length must be specified" );
            }
            String contentType = request.getContentType();
            System.out.println("request.getContentType(): " +request.getContentType() );
            System.out.println("request.getContentLength(): " +request.getContentLength() );

            boolean contentTypeIsOkay = false;
            // Content-Type must be specified.
            if ( contentType != null ) {
                  // The type must be plain text.
                  if ( contentType.startsWith( "text/xml" ) ) {
                        // And it must be UTF-8 encoded (or unspecified, in which case
                        // we assume
                        // that it's either UTF-8 or ASCII).
                        if ( contentType.indexOf( "charset=" ) == -1 ) {
                              contentTypeIsOkay = true;
                        } else if ( contentType.indexOf( "charset=utf-8" ) != -1 ) {
                              contentTypeIsOkay = true;
                        }
                  }
            }
            if ( !contentTypeIsOkay ) {
                  throw new ServletException(
                  "Content-Type must be 'text/xml' with 'charset=utf-8' (or unspecified charset)" );
            }
            InputStream in = request.getInputStream();
            //      InputStreamReader in = new InputStreamReader(request.getInputStream(), "UTF-8");
            String decoded = null;
            String pay = null;
            try {
                  byte[] payload = new byte[contentLength];
                  int offset = 0;
                  int len = contentLength;
                  int byteCount;
                  while ( offset < contentLength ) {
                        byteCount = in.read( payload, offset, len );
                        if ( byteCount == -1 ) {
                              throw new ServletException( "Client did not send " + contentLength + " bytes as expected" );
                        }
                        offset += byteCount;
                        len -= byteCount;
                  }
                  pay = new String( payload, "UTF-8" );
                  System.out.println("xml is : " +pay );

                  decoded = URLDecoder.decode(pay, "utf-8");
                  System.out.println("decoded : " +decoded );

            } finally {
                  if ( in != null ) {
                        in.close();
                  }
            }

            sun.io.ByteToCharConverter fromUnicode;
            String convertedStr = decoded;
            try {
                  fromUnicode = sun.io.ByteToCharConverter.getConverter("UTF-8");
                  fromUnicode.setSubstitutionMode(true);

                  char[] convertedChars;

                  convertedChars = fromUnicode.convertAll(convertedStr.getBytes());

                  convertedStr = new String(convertedChars);
                  System.out.println("convertedStr : " +convertedStr );
                  
            } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
            }
            
            
            InputStream inputStream = request.getInputStream();

            System.out.println("request.getCharacterEncoding()  : " + request.getCharacterEncoding() );
            SAXBuilder builder = null;
        // Create an instance of the tester and test
        builder = new SAXBuilder();
       
     
        Document doc= builder.build(new java.io.ByteArrayInputStream(convertedStr.getBytes()));

//////ERROR : Illegal XML character:  &#x4;.
       
        Element user_data =doc.getRootElement();
   
0
Comment
Question by:istiaquem
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 60

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 22866278
See if these help:

Deals with character encoding to handle accented characters like you are using:
http://www.javazoom.net/services/newsletter/xmlgeneration.html

Dealing with unicode characters:
http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=34&t=003647

Escaping other characters reference:
http://www.javapractices.com/topic/TopicAction.do?Id=96

Hopefully that helps.  Think the first link will be what you are looking for and the others are for light reading.
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 22866292
In case links aren't working -- the suggestion is to try character encoding ISO-8859-1.
0

Featured Post

Tutorials alone can't teach real engineering

So we built better training tools.

-Hands-on Labs
-Instructor Mentoring
-Scenario-Based Tests
-Dedicated Cloud Servers

All at your fingertips. What are you waiting for?

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Video by: Michael
Viewers learn about how to reduce the potential repetitiveness of coding in main by developing methods to perform specific tasks for their program. Additionally, objects are introduced for the purpose of learning how to call methods in Java. Define …
The viewer will learn how to implement Singleton Design Pattern in Java.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question