Solved

parsing xml string with DocumentBuilder

Posted on 2007-03-29
12
479 Views
Last Modified: 2013-12-22
Using javax DocumentBuilder to parse an xml string. I'm getting unpredictable results and have no idea what is wrong. I'm passing the parse() call a ByteArrayInputStream..is this incorrect? The code below is printing the following results:

07/03/29 14:41:49 survey_c length: 13  <--- This should be 1
07/03/29 14:41:49 list_multimedia_c length: 3 <-- this is correct
07/03/29 14:41:49 multimedia_c length: 0 <---this should only print once and should be 3
07/03/29 14:41:49 multimedia_c length: 7
07/03/29 14:41:49 multimedia_c length: 0

The xml string:
"<SURVEY>
  <LIST_MULTIMEDIA>
    <MULTIMEDIA>
      <MULT_TYPE>JPG</MULT_TYPE>
      <MULT_REF>test.jpg</MULT_REF>
      <MULT_DESC>test logo</MULT_DESC>
    </MULTIMEDIA>
  </LIST_MULTIMEDIA>
</SURVEY>"

//the code
            // xmlstring = "<SURVEY>......";
            db = dbf.newDocumentBuilder();
            doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();          
           
            NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
            System.out.println("survey_c length: " + survey_c.getLength());
            for (int i = 0; i < survey_c.getLength(); i++) {
                Node thisNode = survey_c.item(i);
                // get multimedia references
                if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                    NodeList list_multimedia_c = thisNode.getChildNodes();
                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                        Node multimedia = list_multimedia_c.item(j);
                        NodeList multimedia_c = multimedia.getChildNodes();
                        System.out.println("multimedia_c length: " + multimedia_c.getLength());
                        String type = "";
                        String ref = "";
                        String desc = "";
                        for (int k = 0; k < multimedia_c.getLength(); k++) {
                            Node mediaNode = multimedia_c.item(k);
                            if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                type = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                ref = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                desc = mediaNode.getNodeValue();
                            }
                        }
                        siteReport.addMediaFile(new MediaFile(type, ref, desc));
                    }

                    // get survey coordinates
                } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                    // TODO
                }
            }            
            // add to site report vector
            siteReports.add(siteReport);

0
Comment
Question by:jstretch
  • 5
  • 4
  • 3
12 Comments
 
LVL 30

Assisted Solution

by:Mayank S
Mayank S earned 200 total points
ID: 18819302
>> doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));

Try: doc = db.parse ( new StringReader (xmlstring ) ) ;
0
 
LVL 19

Accepted Solution

by:
Kuldeepchaturvedi earned 300 total points
ID: 18820538
doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();      

as mayank is saying use a string reader its easier;

do doc.normalize(); just after parsing it ( it takes out all un-necessary white spaces)..

& XML DOES counts spaces as nodes/values... that might be throwing it off in the calculations
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820582
the parse() method requires an InputStream, using StringReader wont compile.

Tried doc.normalize() but still printed out the same results. Perhaps this is an encoding issue? I am getting this file out of a zip file using java objects (ZipFile, ZipItem, etc..)

0
Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
ID: 18820721
I tried running your code on my machine & it is providing me correct results..!
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

/*
 * Created on Mar 30, 2007
 *
 * To change the template for this generated file go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */

/**
 * @author kchaturv
 *
 * To change the template for this generated type comment go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */
public class TestCase {

public void TryIt()throws Exception
{
      String xmlstring="<SURVEY><LIST_MULTIMEDIA><MULTIMEDIA><MULT_TYPE>JPG</MULT_TYPE><MULT_REF>test.jpg</MULT_REF>              <MULT_DESC>test logo</MULT_DESC></MULTIMEDIA></LIST_MULTIMEDIA></SURVEY>";

//        the code
                        // xmlstring = "<SURVEY>......";
                        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
                        DocumentBuilder db = dbf.newDocumentBuilder();
                        Document doc = db.parse(new InputSource(new StringReader(xmlstring)));
                        //SiteReport siteReport = new SiteReport();          
           
                        NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
                        System.out.println("survey_c length: " + survey_c.getLength());
                        for (int i = 0; i < survey_c.getLength(); i++) {
                              Node thisNode = survey_c.item(i);
                              // get multimedia references
                              if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                                    NodeList list_multimedia_c = thisNode.getChildNodes();
                                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                                          Node multimedia = list_multimedia_c.item(j);
                                          NodeList multimedia_c = multimedia.getChildNodes();
                                          System.out.println("multimedia_c length: " + multimedia_c.getLength());
                                          String type = "";
                                          String ref = "";
                                          String desc = "";
                                          for (int k = 0; k < multimedia_c.getLength(); k++) {
                                                Node mediaNode = multimedia_c.item(k);
                                                if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                                      type = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                                      ref = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                                      desc = mediaNode.getNodeValue();
                                                }
                                          }
                                    //      siteReport.addMediaFile(new MediaFile(type, ref, desc));
                                    }

                                    // get survey coordinates
                              } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                                    // TODO
                              }
                        }            
                        // add to site report vector
                  //      siteReports.add(siteReport);

}
public static void main(String args[])
{
      try{
            new TestCase().TryIt();
      }
      catch(Exception e)
      {
            e.printStackTrace();
      }
}

}


Following are the results that I got...

survey_c length: 1
list_multimedia_c length: 1
multimedia_c length: 4

so most probably the XML string that you are receiving in this method are not what you are thinking they should be...
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820741
>> the parse() method requires an InputStream, using StringReader wont compile.

Sorry it has an overload which needs an InputSource - you can use it the way kuldeep as suggested.
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820744
(the input source being from the string reader)
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820863
Well thats working, I got better numbers but still alittle off...however: The XML I provided was only one node (for simplicity)..some of the other nodes have special characters..

What special characters would blow up the parser? Apostrophe, comma? Should I use a regex to replace those characters? (of course I dont see any < or > which is obvious.)
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820878
The special characters have escape sequences available, e.g., &lt for < and &gt for >
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820905
Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.
0
 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
ID: 18820917
>>Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.

As I said the XML Parser counts spaces as valid nodes... thats why normalize should be used..

or a pre parser which takes out the spaces & linefeeds from the source.
ASP.NET is much better this regards..:-)
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18821002
I guess as per the DOM specification, it is actually supposed to count them :)
0
 
LVL 6

Author Comment

by:jstretch
ID: 18821037
Well a simple regex replace with the whitespace char should fix it.

normalize() was not removing the white space...at least with my implementation.

I was going to try Xerces but it looks a little to bloated for what I need.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
groupSum5 challenge 5 84
numbers ascending pyramid 101 191
web services creation SOAP vs REST 5 38
hibernate example for saving data 19 37
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question