Solved

parsing xml string with DocumentBuilder

Posted on 2007-03-29
12
474 Views
Last Modified: 2013-12-22
Using javax DocumentBuilder to parse an xml string. I'm getting unpredictable results and have no idea what is wrong. I'm passing the parse() call a ByteArrayInputStream..is this incorrect? The code below is printing the following results:

07/03/29 14:41:49 survey_c length: 13  <--- This should be 1
07/03/29 14:41:49 list_multimedia_c length: 3 <-- this is correct
07/03/29 14:41:49 multimedia_c length: 0 <---this should only print once and should be 3
07/03/29 14:41:49 multimedia_c length: 7
07/03/29 14:41:49 multimedia_c length: 0

The xml string:
"<SURVEY>
  <LIST_MULTIMEDIA>
    <MULTIMEDIA>
      <MULT_TYPE>JPG</MULT_TYPE>
      <MULT_REF>test.jpg</MULT_REF>
      <MULT_DESC>test logo</MULT_DESC>
    </MULTIMEDIA>
  </LIST_MULTIMEDIA>
</SURVEY>"

//the code
            // xmlstring = "<SURVEY>......";
            db = dbf.newDocumentBuilder();
            doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();          
           
            NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
            System.out.println("survey_c length: " + survey_c.getLength());
            for (int i = 0; i < survey_c.getLength(); i++) {
                Node thisNode = survey_c.item(i);
                // get multimedia references
                if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                    NodeList list_multimedia_c = thisNode.getChildNodes();
                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                        Node multimedia = list_multimedia_c.item(j);
                        NodeList multimedia_c = multimedia.getChildNodes();
                        System.out.println("multimedia_c length: " + multimedia_c.getLength());
                        String type = "";
                        String ref = "";
                        String desc = "";
                        for (int k = 0; k < multimedia_c.getLength(); k++) {
                            Node mediaNode = multimedia_c.item(k);
                            if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                type = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                ref = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                desc = mediaNode.getNodeValue();
                            }
                        }
                        siteReport.addMediaFile(new MediaFile(type, ref, desc));
                    }

                    // get survey coordinates
                } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                    // TODO
                }
            }            
            // add to site report vector
            siteReports.add(siteReport);

0
Comment
Question by:jstretch
  • 5
  • 4
  • 3
12 Comments
 
LVL 30

Assisted Solution

by:mayankeagle
mayankeagle earned 200 total points
Comment Utility
>> doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));

Try: doc = db.parse ( new StringReader (xmlstring ) ) ;
0
 
LVL 19

Accepted Solution

by:
Kuldeepchaturvedi earned 300 total points
Comment Utility
doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();      

as mayank is saying use a string reader its easier;
&
do doc.normalize(); just after parsing it ( it takes out all un-necessary white spaces)..

& XML DOES counts spaces as nodes/values... that might be throwing it off in the calculations
0
 
LVL 6

Author Comment

by:jstretch
Comment Utility
the parse() method requires an InputStream, using StringReader wont compile.

Tried doc.normalize() but still printed out the same results. Perhaps this is an encoding issue? I am getting this file out of a zip file using java objects (ZipFile, ZipItem, etc..)

0
 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
Comment Utility
I tried running your code on my machine & it is providing me correct results..!
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

/*
 * Created on Mar 30, 2007
 *
 * To change the template for this generated file go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */

/**
 * @author kchaturv
 *
 * To change the template for this generated type comment go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */
public class TestCase {

public void TryIt()throws Exception
{
      String xmlstring="<SURVEY><LIST_MULTIMEDIA><MULTIMEDIA><MULT_TYPE>JPG</MULT_TYPE><MULT_REF>test.jpg</MULT_REF>              <MULT_DESC>test logo</MULT_DESC></MULTIMEDIA></LIST_MULTIMEDIA></SURVEY>";

//        the code
                        // xmlstring = "<SURVEY>......";
                        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
                        DocumentBuilder db = dbf.newDocumentBuilder();
                        Document doc = db.parse(new InputSource(new StringReader(xmlstring)));
                        //SiteReport siteReport = new SiteReport();          
           
                        NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
                        System.out.println("survey_c length: " + survey_c.getLength());
                        for (int i = 0; i < survey_c.getLength(); i++) {
                              Node thisNode = survey_c.item(i);
                              // get multimedia references
                              if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                                    NodeList list_multimedia_c = thisNode.getChildNodes();
                                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                                          Node multimedia = list_multimedia_c.item(j);
                                          NodeList multimedia_c = multimedia.getChildNodes();
                                          System.out.println("multimedia_c length: " + multimedia_c.getLength());
                                          String type = "";
                                          String ref = "";
                                          String desc = "";
                                          for (int k = 0; k < multimedia_c.getLength(); k++) {
                                                Node mediaNode = multimedia_c.item(k);
                                                if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                                      type = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                                      ref = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                                      desc = mediaNode.getNodeValue();
                                                }
                                          }
                                    //      siteReport.addMediaFile(new MediaFile(type, ref, desc));
                                    }

                                    // get survey coordinates
                              } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                                    // TODO
                              }
                        }            
                        // add to site report vector
                  //      siteReports.add(siteReport);

}
public static void main(String args[])
{
      try{
            new TestCase().TryIt();
      }
      catch(Exception e)
      {
            e.printStackTrace();
      }
}

}


Following are the results that I got...

survey_c length: 1
list_multimedia_c length: 1
multimedia_c length: 4

so most probably the XML string that you are receiving in this method are not what you are thinking they should be...
0
 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
>> the parse() method requires an InputStream, using StringReader wont compile.

Sorry it has an overload which needs an InputSource - you can use it the way kuldeep as suggested.
0
 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
(the input source being from the string reader)
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 6

Author Comment

by:jstretch
Comment Utility
Well thats working, I got better numbers but still alittle off...however: The XML I provided was only one node (for simplicity)..some of the other nodes have special characters..

What special characters would blow up the parser? Apostrophe, comma? Should I use a regex to replace those characters? (of course I dont see any < or > which is obvious.)
0
 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
The special characters have escape sequences available, e.g., &lt for < and &gt for >
0
 
LVL 6

Author Comment

by:jstretch
Comment Utility
Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.
0
 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
Comment Utility
>>Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.

As I said the XML Parser counts spaces as valid nodes... thats why normalize should be used..

or a pre parser which takes out the spaces & linefeeds from the source.
ASP.NET is much better this regards..:-)
0
 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
I guess as per the DOM specification, it is actually supposed to count them :)
0
 
LVL 6

Author Comment

by:jstretch
Comment Utility
Well a simple regex replace with the whitespace char should fix it.

normalize() was not removing the white space...at least with my implementation.

I was going to try Xerces but it looks a little to bloated for what I need.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Java contains several comparison operators (e.g., <, <=, >, >=, ==, !=) that allow you to compare primitive values. However, these operators cannot be used to compare the contents of objects. Interface Comparable is used to allow objects of a cl…
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now