Solved

Java xml parse error: Special characters

Posted on 2008-11-02
3
2,178 Views
Last Modified: 2015-01-05
I have this issue lingering for a while. I have an xml  which has special characters and I am trying to parse them and have serious problem. Experts please advice.

Here is the xml

 <?xml version="1.0" encoding="UTF-8"?>
<user_data>
    <time_taken>ÀÀÀÀ</time_taken>  ///SPECIAL CHARACTERS.
</user_data>

Here is my servlet which parses:

protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception
      {
            request.setCharacterEncoding("UTF-8");

            int contentLength = request.getContentLength();
            if ( contentLength == -1 ) {
                  // Content length must be known.
                  throw new ServletException( "Content-Length must be specified" );
            }
            String contentType = request.getContentType();
            System.out.println("request.getContentType(): " +request.getContentType() );
            System.out.println("request.getContentLength(): " +request.getContentLength() );

            boolean contentTypeIsOkay = false;
            // Content-Type must be specified.
            if ( contentType != null ) {
                  // The type must be plain text.
                  if ( contentType.startsWith( "text/xml" ) ) {
                        // And it must be UTF-8 encoded (or unspecified, in which case
                        // we assume
                        // that it's either UTF-8 or ASCII).
                        if ( contentType.indexOf( "charset=" ) == -1 ) {
                              contentTypeIsOkay = true;
                        } else if ( contentType.indexOf( "charset=utf-8" ) != -1 ) {
                              contentTypeIsOkay = true;
                        }
                  }
            }
            if ( !contentTypeIsOkay ) {
                  throw new ServletException(
                  "Content-Type must be 'text/xml' with 'charset=utf-8' (or unspecified charset)" );
            }
            InputStream in = request.getInputStream();
            //      InputStreamReader in = new InputStreamReader(request.getInputStream(), "UTF-8");
            String decoded = null;
            String pay = null;
            try {
                  byte[] payload = new byte[contentLength];
                  int offset = 0;
                  int len = contentLength;
                  int byteCount;
                  while ( offset < contentLength ) {
                        byteCount = in.read( payload, offset, len );
                        if ( byteCount == -1 ) {
                              throw new ServletException( "Client did not send " + contentLength + " bytes as expected" );
                        }
                        offset += byteCount;
                        len -= byteCount;
                  }
                  pay = new String( payload, "UTF-8" );
                  System.out.println("xml is : " +pay );

                  decoded = URLDecoder.decode(pay, "utf-8");
                  System.out.println("decoded : " +decoded );

            } finally {
                  if ( in != null ) {
                        in.close();
                  }
            }

            sun.io.ByteToCharConverter fromUnicode;
            String convertedStr = decoded;
            try {
                  fromUnicode = sun.io.ByteToCharConverter.getConverter("UTF-8");
                  fromUnicode.setSubstitutionMode(true);

                  char[] convertedChars;

                  convertedChars = fromUnicode.convertAll(convertedStr.getBytes());

                  convertedStr = new String(convertedChars);
                  System.out.println("convertedStr : " +convertedStr );
                  
            } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
            }
            
            
            InputStream inputStream = request.getInputStream();

            System.out.println("request.getCharacterEncoding()  : " + request.getCharacterEncoding() );
            SAXBuilder builder = null;
        // Create an instance of the tester and test
        builder = new SAXBuilder();
       
     
        Document doc= builder.build(new java.io.ByteArrayInputStream(convertedStr.getBytes()));

//////ERROR : Illegal XML character:  &#x4;.
       
        Element user_data =doc.getRootElement();
   
0
Comment
Question by:istiaquem
  • 2
3 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 22866278
See if these help:

Deals with character encoding to handle accented characters like you are using:
http://www.javazoom.net/services/newsletter/xmlgeneration.html

Dealing with unicode characters:
http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=34&t=003647

Escaping other characters reference:
http://www.javapractices.com/topic/TopicAction.do?Id=96

Hopefully that helps.  Think the first link will be what you are looking for and the others are for light reading.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 22866292
In case links aren't working -- the suggestion is to try character encoding ISO-8859-1.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Java contains several comparison operators (e.g., <, <=, >, >=, ==, !=) that allow you to compare primitive values. However, these operators cannot be used to compare the contents of objects. Interface Comparable is used to allow objects of a cl…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now