Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

parsing xml string with DocumentBuilder

Posted on 2007-03-29
12
Medium Priority
?
491 Views
Last Modified: 2013-12-22
Using javax DocumentBuilder to parse an xml string. I'm getting unpredictable results and have no idea what is wrong. I'm passing the parse() call a ByteArrayInputStream..is this incorrect? The code below is printing the following results:

07/03/29 14:41:49 survey_c length: 13  <--- This should be 1
07/03/29 14:41:49 list_multimedia_c length: 3 <-- this is correct
07/03/29 14:41:49 multimedia_c length: 0 <---this should only print once and should be 3
07/03/29 14:41:49 multimedia_c length: 7
07/03/29 14:41:49 multimedia_c length: 0

The xml string:
"<SURVEY>
  <LIST_MULTIMEDIA>
    <MULTIMEDIA>
      <MULT_TYPE>JPG</MULT_TYPE>
      <MULT_REF>test.jpg</MULT_REF>
      <MULT_DESC>test logo</MULT_DESC>
    </MULTIMEDIA>
  </LIST_MULTIMEDIA>
</SURVEY>"

//the code
            // xmlstring = "<SURVEY>......";
            db = dbf.newDocumentBuilder();
            doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();          
           
            NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
            System.out.println("survey_c length: " + survey_c.getLength());
            for (int i = 0; i < survey_c.getLength(); i++) {
                Node thisNode = survey_c.item(i);
                // get multimedia references
                if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                    NodeList list_multimedia_c = thisNode.getChildNodes();
                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                        Node multimedia = list_multimedia_c.item(j);
                        NodeList multimedia_c = multimedia.getChildNodes();
                        System.out.println("multimedia_c length: " + multimedia_c.getLength());
                        String type = "";
                        String ref = "";
                        String desc = "";
                        for (int k = 0; k < multimedia_c.getLength(); k++) {
                            Node mediaNode = multimedia_c.item(k);
                            if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                type = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                ref = mediaNode.getNodeValue();
                            } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                desc = mediaNode.getNodeValue();
                            }
                        }
                        siteReport.addMediaFile(new MediaFile(type, ref, desc));
                    }

                    // get survey coordinates
                } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                    // TODO
                }
            }            
            // add to site report vector
            siteReports.add(siteReport);

0
Comment
Question by:jstretch
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
12 Comments
 
LVL 30

Assisted Solution

by:Mayank S
Mayank S earned 800 total points
ID: 18819302
>> doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));

Try: doc = db.parse ( new StringReader (xmlstring ) ) ;
0
 
LVL 19

Accepted Solution

by:
Kuldeepchaturvedi earned 1200 total points
ID: 18820538
doc = db.parse(new ByteArrayInputStream(xmlstring.getBytes()));
            SiteReport siteReport = new SiteReport();      

as mayank is saying use a string reader its easier;

do doc.normalize(); just after parsing it ( it takes out all un-necessary white spaces)..

& XML DOES counts spaces as nodes/values... that might be throwing it off in the calculations
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820582
the parse() method requires an InputStream, using StringReader wont compile.

Tried doc.normalize() but still printed out the same results. Perhaps this is an encoding issue? I am getting this file out of a zip file using java objects (ZipFile, ZipItem, etc..)

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
ID: 18820721
I tried running your code on my machine & it is providing me correct results..!
import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

/*
 * Created on Mar 30, 2007
 *
 * To change the template for this generated file go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */

/**
 * @author kchaturv
 *
 * To change the template for this generated type comment go to
 * Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and Comments
 */
public class TestCase {

public void TryIt()throws Exception
{
      String xmlstring="<SURVEY><LIST_MULTIMEDIA><MULTIMEDIA><MULT_TYPE>JPG</MULT_TYPE><MULT_REF>test.jpg</MULT_REF>              <MULT_DESC>test logo</MULT_DESC></MULTIMEDIA></LIST_MULTIMEDIA></SURVEY>";

//        the code
                        // xmlstring = "<SURVEY>......";
                        DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();
                        DocumentBuilder db = dbf.newDocumentBuilder();
                        Document doc = db.parse(new InputSource(new StringReader(xmlstring)));
                        //SiteReport siteReport = new SiteReport();          
           
                        NodeList survey_c = doc.getChildNodes().item(0).getChildNodes();
                        System.out.println("survey_c length: " + survey_c.getLength());
                        for (int i = 0; i < survey_c.getLength(); i++) {
                              Node thisNode = survey_c.item(i);
                              // get multimedia references
                              if (thisNode.getNodeName().equalsIgnoreCase("LIST_MULTIMEDIA")) {
                                    NodeList list_multimedia_c = thisNode.getChildNodes();
                                    System.out.println("list_multimedia_c length: " + list_multimedia_c.getLength());
                                    for (int j = 0; j < list_multimedia_c.getLength(); j++) {
                                          Node multimedia = list_multimedia_c.item(j);
                                          NodeList multimedia_c = multimedia.getChildNodes();
                                          System.out.println("multimedia_c length: " + multimedia_c.getLength());
                                          String type = "";
                                          String ref = "";
                                          String desc = "";
                                          for (int k = 0; k < multimedia_c.getLength(); k++) {
                                                Node mediaNode = multimedia_c.item(k);
                                                if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_TYPE")) {
                                                      type = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_REF")) {
                                                      ref = mediaNode.getNodeValue();
                                                } else if (mediaNode.getNodeName().toUpperCase().equalsIgnoreCase("MULT_DESC")) {
                                                      desc = mediaNode.getNodeValue();
                                                }
                                          }
                                    //      siteReport.addMediaFile(new MediaFile(type, ref, desc));
                                    }

                                    // get survey coordinates
                              } else if (thisNode.getNodeName().equalsIgnoreCase("")) {
                                    // TODO
                              }
                        }            
                        // add to site report vector
                  //      siteReports.add(siteReport);

}
public static void main(String args[])
{
      try{
            new TestCase().TryIt();
      }
      catch(Exception e)
      {
            e.printStackTrace();
      }
}

}


Following are the results that I got...

survey_c length: 1
list_multimedia_c length: 1
multimedia_c length: 4

so most probably the XML string that you are receiving in this method are not what you are thinking they should be...
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820741
>> the parse() method requires an InputStream, using StringReader wont compile.

Sorry it has an overload which needs an InputSource - you can use it the way kuldeep as suggested.
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820744
(the input source being from the string reader)
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820863
Well thats working, I got better numbers but still alittle off...however: The XML I provided was only one node (for simplicity)..some of the other nodes have special characters..

What special characters would blow up the parser? Apostrophe, comma? Should I use a regex to replace those characters? (of course I dont see any < or > which is obvious.)
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18820878
The special characters have escape sequences available, e.g., &lt for < and &gt for >
0
 
LVL 6

Author Comment

by:jstretch
ID: 18820905
Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.
0
 
LVL 19

Expert Comment

by:Kuldeepchaturvedi
ID: 18820917
>>Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.

As I said the XML Parser counts spaces as valid nodes... thats why normalize should be used..

or a pre parser which takes out the spaces & linefeeds from the source.
ASP.NET is much better this regards..:-)
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 18821002
I guess as per the DOM specification, it is actually supposed to count them :)
0
 
LVL 6

Author Comment

by:jstretch
ID: 18821037
Well a simple regex replace with the whitespace char should fix it.

normalize() was not removing the white space...at least with my implementation.

I was going to try Xerces but it looks a little to bloated for what I need.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question