Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Java xml parse error: Special characters

Posted on 2008-11-02
3
Medium Priority
?
2,205 Views
Last Modified: 2015-01-05
I have this issue lingering for a while. I have an xml  which has special characters and I am trying to parse them and have serious problem. Experts please advice.

Here is the xml

 <?xml version="1.0" encoding="UTF-8"?>
<user_data>
    <time_taken>ÀÀÀÀ</time_taken>  ///SPECIAL CHARACTERS.
</user_data>

Here is my servlet which parses:

protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception
      {
            request.setCharacterEncoding("UTF-8");

            int contentLength = request.getContentLength();
            if ( contentLength == -1 ) {
                  // Content length must be known.
                  throw new ServletException( "Content-Length must be specified" );
            }
            String contentType = request.getContentType();
            System.out.println("request.getContentType(): " +request.getContentType() );
            System.out.println("request.getContentLength(): " +request.getContentLength() );

            boolean contentTypeIsOkay = false;
            // Content-Type must be specified.
            if ( contentType != null ) {
                  // The type must be plain text.
                  if ( contentType.startsWith( "text/xml" ) ) {
                        // And it must be UTF-8 encoded (or unspecified, in which case
                        // we assume
                        // that it's either UTF-8 or ASCII).
                        if ( contentType.indexOf( "charset=" ) == -1 ) {
                              contentTypeIsOkay = true;
                        } else if ( contentType.indexOf( "charset=utf-8" ) != -1 ) {
                              contentTypeIsOkay = true;
                        }
                  }
            }
            if ( !contentTypeIsOkay ) {
                  throw new ServletException(
                  "Content-Type must be 'text/xml' with 'charset=utf-8' (or unspecified charset)" );
            }
            InputStream in = request.getInputStream();
            //      InputStreamReader in = new InputStreamReader(request.getInputStream(), "UTF-8");
            String decoded = null;
            String pay = null;
            try {
                  byte[] payload = new byte[contentLength];
                  int offset = 0;
                  int len = contentLength;
                  int byteCount;
                  while ( offset < contentLength ) {
                        byteCount = in.read( payload, offset, len );
                        if ( byteCount == -1 ) {
                              throw new ServletException( "Client did not send " + contentLength + " bytes as expected" );
                        }
                        offset += byteCount;
                        len -= byteCount;
                  }
                  pay = new String( payload, "UTF-8" );
                  System.out.println("xml is : " +pay );

                  decoded = URLDecoder.decode(pay, "utf-8");
                  System.out.println("decoded : " +decoded );

            } finally {
                  if ( in != null ) {
                        in.close();
                  }
            }

            sun.io.ByteToCharConverter fromUnicode;
            String convertedStr = decoded;
            try {
                  fromUnicode = sun.io.ByteToCharConverter.getConverter("UTF-8");
                  fromUnicode.setSubstitutionMode(true);

                  char[] convertedChars;

                  convertedChars = fromUnicode.convertAll(convertedStr.getBytes());

                  convertedStr = new String(convertedChars);
                  System.out.println("convertedStr : " +convertedStr );
                  
            } catch (UnsupportedEncodingException e) {
                  e.printStackTrace();
            }
            
            
            InputStream inputStream = request.getInputStream();

            System.out.println("request.getCharacterEncoding()  : " + request.getCharacterEncoding() );
            SAXBuilder builder = null;
        // Create an instance of the tester and test
        builder = new SAXBuilder();
       
     
        Document doc= builder.build(new java.io.ByteArrayInputStream(convertedStr.getBytes()));

//////ERROR : Illegal XML character:  &#x4;.
       
        Element user_data =doc.getRootElement();
   
0
Comment
Question by:istiaquem
  • 2
2 Comments
 
LVL 61

Accepted Solution

by:
Kevin Cross earned 2000 total points
ID: 22866278
See if these help:

Deals with character encoding to handle accented characters like you are using:
http://www.javazoom.net/services/newsletter/xmlgeneration.html

Dealing with unicode characters:
http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=34&t=003647

Escaping other characters reference:
http://www.javapractices.com/topic/TopicAction.do?Id=96

Hopefully that helps.  Think the first link will be what you are looking for and the others are for light reading.
0
 
LVL 61

Expert Comment

by:Kevin Cross
ID: 22866292
In case links aren't working -- the suggestion is to try character encoding ISO-8859-1.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I had a project requirement for a displaying a user workbench .This workbench would consist multiple data grids .In each grid the user will be able to see a large number of data. These data grids should allow the user to 1. Sort 2. Export the …
In this post we will learn different types of Android Layout and some basics of an Android App.
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses
Course of the Month13 days, 5 hours left to enroll

579 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question