Solved

Java xml parse error: Special characters

Posted on 2008-11-02
3
2,186 Views
Last Modified: 2015-01-05
I have this issue lingering for a while. I have an xml  which has special characters and I am trying to parse them and have serious problem. Experts please advice.

Here is the xml

 <?xml version="1.0" encoding="UTF-8"?>
<user_data>
    <time_taken>ÀÀÀÀ</time_taken>  ///SPECIAL CHARACTERS.
</user_data>

Here is my servlet which parses:

protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception
      {
            request.setCharacterEncoding("UTF-8");

            int contentLength = request.getContentLength();
            if ( contentLength == -1 ) {
                  // Content length must be known.
                  throw new ServletException( "Content-Length must be specified" );
            }
            String contentType = request.getContentType();
            System.out.println("request.getContentType(): " +request.getContentType() );
            System.out.println("request.getContentLength(): " +request.getContentLength() );

            boolean contentTypeIsOkay = false;
            // Content-Type must be specified.
            if ( contentType != null ) {
                  // The type must be plain text.
                  if ( contentType.startsWith( "text/xml" ) ) {
                        // And it must be UTF-8 encoded (or unspecified, in which case
                        // we assume
                        // that it's either UTF-8 or ASCII).
                        if ( contentType.indexOf( "charset=" ) == -1 ) {
                              contentTypeIsOkay = true;
                        } else if ( contentType.indexOf( "charset=utf-8" ) != -1 ) {
                              contentTypeIsOkay = true;
                        }
                  }
            }
            if ( !contentTypeIsOkay ) {
                  throw new ServletException(
                  "Content-Type must be 'text/xml' with 'charset=utf-8' (or unspecified charset)" );
            }
            InputStream in = request.getInputStream();
            //      InputStreamReader in = new InputStreamReader(request.getInputStream(), "UTF-8");
            String decoded = null;
            String pay = null;
            try {
                  byte[] payload = new byte[contentLength];
                  int offset = 0;
                  int len = contentLength;
                  int byteCount;
                  while ( offset < contentLength ) {
                        byteCount = in.read( payload, offset, len );
                        if ( byteCount == -1 ) {
                              throw new ServletException( "Client did not send " + contentLength + " bytes as expected" );
                        }
                        offset += byteCount;
                        len -= byteCount;
                  }
                  pay = new String( payload, "UTF-8" );
                  System.out.println("xml is : " +pay );

                  decoded = URLDecoder.decode(pay, "utf-8");
                  System.out.println("decoded : " +decoded );

            } finally {
                  if ( in != null ) {
                        in.close();
                  }
            }

            sun.io.ByteToCharConverter fromUnicode;
            String convertedStr = decoded;
            try {
                  fromUnicode = sun.io.ByteToCharConverter.getConverter("UTF-8");
                  fromUnicode.setSubstitutionMode(true);

                  char[] convertedChars;

                  convertedChars = fromUnicode.convertAll(convertedStr.getBytes());

                  convertedStr = new String(convertedChars);
                  System.out.println("convertedStr : " +convertedStr );
                  
            } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
            }
            
            
            InputStream inputStream = request.getInputStream();

            System.out.println("request.getCharacterEncoding()  : " + request.getCharacterEncoding() );
            SAXBuilder builder = null;
        // Create an instance of the tester and test
        builder = new SAXBuilder();
       
     
        Document doc= builder.build(new java.io.ByteArrayInputStream(convertedStr.getBytes()));

//////ERROR : Illegal XML character:  &#x4;.
       
        Element user_data =doc.getRootElement();
   
0
Comment
Question by:istiaquem
  • 2
3 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 22866278
See if these help:

Deals with character encoding to handle accented characters like you are using:
http://www.javazoom.net/services/newsletter/xmlgeneration.html

Dealing with unicode characters:
http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=34&t=003647

Escaping other characters reference:
http://www.javapractices.com/topic/TopicAction.do?Id=96

Hopefully that helps.  Think the first link will be what you are looking for and the others are for light reading.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 22866292
In case links aren't working -- the suggestion is to try character encoding ISO-8859-1.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Android studio getdrawable(int) is deprecated 4 87
Unhandled exception type Exception 18 31
How to convert from xls to xlsx using java 7 41
Problem to Alipay 10 24
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
This video teaches viewers about errors in exception handling.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question