Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 380
  • Last Modified:

Exception while parsing XML file

I am getting a String which is a document in XML format.

String d = .... //variable d now contains an XML

What I want to do is navigate through this xml and print some node values. To do this my code is

      byte b[] = d.getBytes();
          InputStream is = new ByteArrayInputStream(b);
          org.w3c.dom.Document doc12 = null;      
                try {
                                  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
                                  DocumentBuilder docb = dbf.newDocumentBuilder();
                                  doc12 = docb.parse(is);    //Exception is thrown here
                                  Element elmt = doc12.getDocumentElement();      
                                  System.out.println("Root Node Name : "+elmt.getNodeName());            
                  }catch(Exception e) {
                                e.printStackTrace();}
                      
The problem is when it reaches the line "doc12=......" it throws an exception

java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
      at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
      at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
      at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
      at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
      at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
      at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
      at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
      at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
      at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
      at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
      at com.wk.steuernetz.support.util.SteuernetzFormatter.sendTextEmail(SteuernetzFormatter.java:204)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:324)
      at com.wk.atlas.tracker.EmailManager.sendMail(EmailManager.java:98)
      at com.wk.atlas.tracker.TrackerEmailBean.onMessage(TrackerEmailBean.java:246)
      at weblogic.ejb20.internal.MDListener.execute(MDListener.java:382)
      at weblogic.ejb20.internal.MDListener.transactionalOnMessage(MDListener.java:316)
      at weblogic.ejb20.internal.MDListener.onMessage(MDListener.java:281)
      at weblogic.jms.client.JMSSession.onMessage(JMSSession.java:2596)
      at weblogic.jms.client.JMSSession.execute(JMSSession.java:2516)
      at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
      at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)


Can someone please tell me how can I get rid of this. The xml that is fetched into the string is something like

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE wkdic
  PUBLIC "-//D//DTD DIC Atlas compliant R1//DE" "dic.dtd">
<dic>
   <common-ident>
      <docid>DE-SN-DOC1271575393</docid>
      <wk-editor>VPW</wk-editor>
      <status>new</status>
   <common-ident>
.......
</dic>


 

Thanks
0
thomas908
Asked:
thomas908
3 Solutions
 
CEHJCommented:
Your encoding is not UTF-8

Use

builder.parse(new InputSource(new StringReader(d)));
0
 
CEHJCommented:
'builder' in your case, of course, is 'docb'
0
 
Giant2Commented:
I believe you do not have clear how parsing.
See here for examples:
http://javaalmanac.com/egs/javax.xml.parsers/pkg.html

Bye, Giant.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
arun_kuttzCommented:
I think uve got to set the encoding specifically... this might work..

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder docb = dbf.newDocumentBuilder();
InputSource inpSrc = new InputSource(is);
inpSrc.setEncoding("UTF-8");
doc12 = docb.parse(inpSrc);    //Exception is thrown here
Element elmt = doc12.getDocumentElement();  

-KuTtZ

0
 
objectsCommented:
>      byte b[] = d.getBytes();

try specifying the charset to use to extract the bytes

     byte b[] = d.getBytes("UTF8");
0
 
Mayank SAssociate Director - Product EngineeringCommented:
I prefer builder.parse () over a StringReader, perhaps it gives better performance.
0
 
CEHJCommented:
Starting with the contents *already* decoded is preferable for obvious reasons, quite apart from its being a one-liner
0
 
thomas908Author Commented:
Thank you all for helping
0
 
CEHJCommented:
:-)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now