Solved

parsing xml string with DocumentBuilder

Posted on 2007-03-29
12
478 Views
Last Modified: 2013-12-22
Using javax DocumentBuilder to parse an xml string. I'm getting unpredictable results and have no idea what is wrong. I'm passing the parse() call a ByteArrayInputStream..is this incorrect? The code below is printing the following results:

07/03/29 14:41:49 survey_c length: 13  <--- This should be 1
07/03/29 14:41:49 list_multimedia_c length: 3 <-- this is correct
07/03/29 14:41:49 multimedia_c length: 0 <---this should only print once and should be 3
07/03/29 14:41:49 multimedia_c length: 7
07/03/29 14:41:49 multimedia_c length: 0

The xml string:
"<SURVEY>
  <LIST_MULTIMEDIA>
    <MULTIMEDIA>
      <MULT_TYPE>JPG</MULT_TYPE>
      <MULT_REF>test.jpg</MULT_REF>
      <MULT_DESC>test logo</MULT_DESC>
    </MULTIMEDIA>
  </LIST_MULTIMEDIA>
</SURVEY>"

//the code
            // xmlstring = "<SURVEY>......";
            db = dbf.newDocumentBuilder();
            doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();          
           
            NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
            System.out.println("survey_c length: " + survey_c.getLength());
            for (int i = 0; i < survey_c.getLength(); i++) {
                Node thisNode = survey_c.item(i);
                // get multimedia references
                if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                    NodeList list_multimedia_c = thisNode.getChildNodes();
                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                        Node multimedia = list_multimedia_c.item(j);
                        NodeList multimedia_c = multimedia.getChildNodes();
                        System.out.println("multimedia_c length: " + multimedia_c.getLength());
                        String type = "";
                        String ref = "";
                        String desc = "";
                        for (int k = 0; k < multimedia_c.getLength(); k++) {
                            Node mediaNode = multimedia_c.item(k);
                            if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                type = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                ref = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                desc = mediaNode.getNodeValue();
                            }
                        }
                        siteReport.addMediaFile(new MediaFile(type, ref, desc));
                    }

                    // get survey coordinates
                } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                    // TODO
                }
            }            
            // add to site report vector
            siteReports.add(siteReport);

0
Comment
Question by:jstretch
  • 5
  • 4
  • 3
12 Comments
 
LVL 30

Assisted Solution

by:Mayank S
Mayank S earned 200 total points
ID: 18819302
>> doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));

Try: doc = db.parse ( new StringReader (xmlstring ) ) ;
0
 
LVL 19

Accepted Solution

by:
Kuldeepchaturvedi earned 300 total points
ID: 18820538
doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();      

as mayank is saying use a string reader its easier;

do doc.normalize(); just after parsing it ( it takes out all un-necessary white spaces)..

& XML DOES counts spaces as nodes/values... that might be throwing it off in the calculations
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820582
the parse() method requires an InputStream, using StringReader wont compile.

Tried doc.normalize() but still printed out the same results. Perhaps this is an encoding issue? I am getting this file out of a zip file using java objects (ZipFile, ZipItem, etc..)

0
 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
ID: 18820721
I tried running your code on my machine & it is providing me correct results..!
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

/*
 * Created on Mar 30, 2007
 *
 * To change the template for this generated file go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */

/**
 * @author kchaturv
 *
 * To change the template for this generated type comment go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */
public class TestCase {

public void TryIt()throws Exception
{
      String xmlstring="<SURVEY><LIST_MULTIMEDIA><MULTIMEDIA><MULT_TYPE>JPG</MULT_TYPE><MULT_REF>test.jpg</MULT_REF>              <MULT_DESC>test logo</MULT_DESC></MULTIMEDIA></LIST_MULTIMEDIA></SURVEY>";

//        the code
                        // xmlstring = "<SURVEY>......";
                        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
                        DocumentBuilder db = dbf.newDocumentBuilder();
                        Document doc = db.parse(new InputSource(new StringReader(xmlstring)));
                        //SiteReport siteReport = new SiteReport();          
           
                        NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
                        System.out.println("survey_c length: " + survey_c.getLength());
                        for (int i = 0; i < survey_c.getLength(); i++) {
                              Node thisNode = survey_c.item(i);
                              // get multimedia references
                              if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                                    NodeList list_multimedia_c = thisNode.getChildNodes();
                                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                                          Node multimedia = list_multimedia_c.item(j);
                                          NodeList multimedia_c = multimedia.getChildNodes();
                                          System.out.println("multimedia_c length: " + multimedia_c.getLength());
                                          String type = "";
                                          String ref = "";
                                          String desc = "";
                                          for (int k = 0; k < multimedia_c.getLength(); k++) {
                                                Node mediaNode = multimedia_c.item(k);
                                                if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                                      type = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                                      ref = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                                      desc = mediaNode.getNodeValue();
                                                }
                                          }
                                    //      siteReport.addMediaFile(new MediaFile(type, ref, desc));
                                    }

                                    // get survey coordinates
                              } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                                    // TODO
                              }
                        }            
                        // add to site report vector
                  //      siteReports.add(siteReport);

}
public static void main(String args[])
{
      try{
            new TestCase().TryIt();
      }
      catch(Exception e)
      {
            e.printStackTrace();
      }
}

}


Following are the results that I got...

survey_c length: 1
list_multimedia_c length: 1
multimedia_c length: 4

so most probably the XML string that you are receiving in this method are not what you are thinking they should be...
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820741
>> the parse() method requires an InputStream, using StringReader wont compile.

Sorry it has an overload which needs an InputSource - you can use it the way kuldeep as suggested.
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820744
(the input source being from the string reader)
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 6

Author Comment

by:jstretch
ID: 18820863
Well thats working, I got better numbers but still alittle off...however: The XML I provided was only one node (for simplicity)..some of the other nodes have special characters..

What special characters would blow up the parser? Apostrophe, comma? Should I use a regex to replace those characters? (of course I dont see any < or > which is obvious.)
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820878
The special characters have escape sequences available, e.g., &lt for < and &gt for >
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820905
Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.
0
 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
ID: 18820917
>>Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.

As I said the XML Parser counts spaces as valid nodes... thats why normalize should be used..

or a pre parser which takes out the spaces & linefeeds from the source.
ASP.NET is much better this regards..:-)
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18821002
I guess as per the DOM specification, it is actually supposed to count them :)
0
 
LVL 6

Author Comment

by:jstretch
ID: 18821037
Well a simple regex replace with the whitespace char should fix it.

normalize() was not removing the white space...at least with my implementation.

I was going to try Xerces but it looks a little to bloated for what I need.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Eclipse Neon and jdk 1.8.0 11 135
Checkbox and ListView in Android Layout 4 56
Spring Framework HTTPSession management 1 23
table example 4 22
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now