jstretch
asked on
parsing xml string with DocumentBuilder
Using javax DocumentBuilder to parse an xml string. I'm getting unpredictable results and have no idea what is wrong. I'm passing the parse() call a ByteArrayInputStream..is this incorrect? The code below is printing the following results:
07/03/29 14:41:49 survey_c length: 13 <--- This should be 1
07/03/29 14:41:49 list_multimedia_c length: 3 <-- this is correct
07/03/29 14:41:49 multimedia_c length: 0 <---this should only print once and should be 3
07/03/29 14:41:49 multimedia_c length: 7
07/03/29 14:41:49 multimedia_c length: 0
The xml string:
"<SURVEY>
<LIST_MULTIMEDIA>
<MULTIMEDIA>
<MULT_TYPE>JPG</MULT_TYPE>
<MULT_REF>test.jpg</MULT_R EF>
<MULT_DESC>test logo</MULT_DESC>
</MULTIMEDIA>
</LIST_MULTIMEDIA>
</SURVEY>"
//the code
// xmlstring = "<SURVEY>......";
db = dbf.newDocumentBuilder();
doc = db.parse(new ByteArrayInputStream(xmlst ring.getBy tes()));
SiteReport siteReport = new SiteReport();
NodeList survey_c = doc.getChildNodes().item(0 ).getChild Nodes();
System.out.println("survey _c length: " + survey_c.getLength());
for (int i = 0; i < survey_c.getLength(); i++) {
Node thisNode = survey_c.item(i);
// get multimedia references
if (thisNode.getNodeName().eq ualsIgnore Case("LIST _MULTIMEDI A")) {
NodeList list_multimedia_c = thisNode.getChildNodes();
System.out.println("list_m ultimedia_ c length: " + list_multimedia_c.getLengt h());
for (int j = 0; j < list_multimedia_c.getLengt h(); j++) {
Node multimedia = list_multimedia_c.item(j);
NodeList multimedia_c = multimedia.getChildNodes() ;
System.out.println("multim edia_c length: " + multimedia_c.getLength());
String type = "";
String ref = "";
String desc = "";
for (int k = 0; k < multimedia_c.getLength(); k++) {
Node mediaNode = multimedia_c.item(k);
if (mediaNode.getNodeName().t oUpperCase ().equalsI gnoreCase( "MULT_TYPE ")) {
type = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t oUpperCase ().equalsI gnoreCase( "MULT_REF" )) {
ref = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t oUpperCase ().equalsI gnoreCase( "MULT_DESC ")) {
desc = mediaNode.getNodeValue();
}
}
siteReport.addMediaFile(ne w MediaFile(type, ref, desc));
}
// get survey coordinates
} else if (thisNode.getNodeName().eq ualsIgnore Case("")) {
// TODO
}
}
// add to site report vector
siteReports.add(siteReport );
07/03/29 14:41:49 survey_c length: 13 <--- This should be 1
07/03/29 14:41:49 list_multimedia_c length: 3 <-- this is correct
07/03/29 14:41:49 multimedia_c length: 0 <---this should only print once and should be 3
07/03/29 14:41:49 multimedia_c length: 7
07/03/29 14:41:49 multimedia_c length: 0
The xml string:
"<SURVEY>
<LIST_MULTIMEDIA>
<MULTIMEDIA>
<MULT_TYPE>JPG</MULT_TYPE>
<MULT_REF>test.jpg</MULT_R
<MULT_DESC>test logo</MULT_DESC>
</MULTIMEDIA>
</LIST_MULTIMEDIA>
</SURVEY>"
//the code
// xmlstring = "<SURVEY>......";
db = dbf.newDocumentBuilder();
doc = db.parse(new ByteArrayInputStream(xmlst
SiteReport siteReport = new SiteReport();
NodeList survey_c = doc.getChildNodes().item(0
System.out.println("survey
for (int i = 0; i < survey_c.getLength(); i++) {
Node thisNode = survey_c.item(i);
// get multimedia references
if (thisNode.getNodeName().eq
NodeList list_multimedia_c = thisNode.getChildNodes();
System.out.println("list_m
for (int j = 0; j < list_multimedia_c.getLengt
Node multimedia = list_multimedia_c.item(j);
NodeList multimedia_c = multimedia.getChildNodes()
System.out.println("multim
String type = "";
String ref = "";
String desc = "";
for (int k = 0; k < multimedia_c.getLength(); k++) {
Node mediaNode = multimedia_c.item(k);
if (mediaNode.getNodeName().t
type = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t
ref = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t
desc = mediaNode.getNodeValue();
}
}
siteReport.addMediaFile(ne
}
// get survey coordinates
} else if (thisNode.getNodeName().eq
// TODO
}
}
// add to site report vector
siteReports.add(siteReport
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I tried running your code on my machine & it is providing me correct results..!
import java.io.StringReader;
import javax.xml.parsers.Document Builder;
import javax.xml.parsers.Document BuilderFac tory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
/*
* Created on Mar 30, 2007
*
* To change the template for this generated file go to
* Window>Preferences>J ava>Cod e Generation>Code and Comments
*/
/**
* @author kchaturv
*
* To change the template for this generated type comment go to
* Window>Preferences>J ava>Cod e Generation>Code and Comments
*/
public class TestCase {
public void TryIt()throws Exception
{
String xmlstring="<SURVEY><LIST_M ULTIMEDIA> <MULTIMEDI A><MULT_TY PE>JPG</MU LT_TYPE><M ULT_REF>te st.jpg</MU LT_REF> <MULT_DESC>test logo</MULT_DESC></MULTIMED IA></LIST_ MULTIMEDIA ></SURVEY> ";
// the code
// xmlstring = "<SURVEY>......";
DocumentBuilderFactory dbf=DocumentBuilderFactory .newInstan ce();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xmlstring)));
//SiteReport siteReport = new SiteReport();
NodeList survey_c = doc.getChildNodes().item(0 ).getChild Nodes();
System.out.println("survey _c length: " + survey_c.getLength());
for (int i = 0; i < survey_c.getLength(); i++) {
Node thisNode = survey_c.item(i);
// get multimedia references
if (thisNode.getNodeName().eq ualsIgnore Case("LIST _MULTIMEDI A")) {
NodeList list_multimedia_c = thisNode.getChildNodes();
System.out.println("list_m ultimedia_ c length: " + list_multimedia_c.getLengt h());
for (int j = 0; j < list_multimedia_c.getLengt h(); j++) {
Node multimedia = list_multimedia_c.item(j);
NodeList multimedia_c = multimedia.getChildNodes() ;
System.out.println("multim edia_c length: " + multimedia_c.getLength());
String type = "";
String ref = "";
String desc = "";
for (int k = 0; k < multimedia_c.getLength(); k++) {
Node mediaNode = multimedia_c.item(k);
if (mediaNode.getNodeName().t oUpperCase ().equalsI gnoreCase( "MULT_TYPE ")) {
type = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t oUpperCase ().equalsI gnoreCase( "MULT_REF" )) {
ref = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t oUpperCase ().equalsI gnoreCase( "MULT_DESC ")) {
desc = mediaNode.getNodeValue();
}
}
// siteReport.addMediaFile(ne w MediaFile(type, ref, desc));
}
// get survey coordinates
} else if (thisNode.getNodeName().eq ualsIgnore Case("")) {
// TODO
}
}
// add to site report vector
// siteReports.add(siteReport );
}
public static void main(String args[])
{
try{
new TestCase().TryIt();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Following are the results that I got...
survey_c length: 1
list_multimedia_c length: 1
multimedia_c length: 4
so most probably the XML string that you are receiving in this method are not what you are thinking they should be...
import java.io.StringReader;
import javax.xml.parsers.Document
import javax.xml.parsers.Document
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
/*
* Created on Mar 30, 2007
*
* To change the template for this generated file go to
* Window>Preferences>J
*/
/**
* @author kchaturv
*
* To change the template for this generated type comment go to
* Window>Preferences>J
*/
public class TestCase {
public void TryIt()throws Exception
{
String xmlstring="<SURVEY><LIST_M
// the code
// xmlstring = "<SURVEY>......";
DocumentBuilderFactory dbf=DocumentBuilderFactory
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xmlstring)));
//SiteReport siteReport = new SiteReport();
NodeList survey_c = doc.getChildNodes().item(0
System.out.println("survey
for (int i = 0; i < survey_c.getLength(); i++) {
Node thisNode = survey_c.item(i);
// get multimedia references
if (thisNode.getNodeName().eq
NodeList list_multimedia_c = thisNode.getChildNodes();
System.out.println("list_m
for (int j = 0; j < list_multimedia_c.getLengt
Node multimedia = list_multimedia_c.item(j);
NodeList multimedia_c = multimedia.getChildNodes()
System.out.println("multim
String type = "";
String ref = "";
String desc = "";
for (int k = 0; k < multimedia_c.getLength(); k++) {
Node mediaNode = multimedia_c.item(k);
if (mediaNode.getNodeName().t
type = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t
ref = mediaNode.getNodeValue();
} else if (mediaNode.getNodeName().t
desc = mediaNode.getNodeValue();
}
}
// siteReport.addMediaFile(ne
}
// get survey coordinates
} else if (thisNode.getNodeName().eq
// TODO
}
}
// add to site report vector
// siteReports.add(siteReport
}
public static void main(String args[])
{
try{
new TestCase().TryIt();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Following are the results that I got...
survey_c length: 1
list_multimedia_c length: 1
multimedia_c length: 4
so most probably the XML string that you are receiving in this method are not what you are thinking they should be...
>> the parse() method requires an InputStream, using StringReader wont compile.
Sorry it has an overload which needs an InputSource - you can use it the way kuldeep as suggested.
Sorry it has an overload which needs an InputSource - you can use it the way kuldeep as suggested.
(the input source being from the string reader)
ASKER
Well thats working, I got better numbers but still alittle off...however: The XML I provided was only one node (for simplicity)..some of the other nodes have special characters..
What special characters would blow up the parser? Apostrophe, comma? Should I use a regex to replace those characters? (of course I dont see any < or > which is obvious.)
What special characters would blow up the parser? Apostrophe, comma? Should I use a regex to replace those characters? (of course I dont see any < or > which is obvious.)
The special characters have escape sequences available, e.g., < for < and > for >
ASKER
Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.
>>Yeah its just more spaces, I just removed line returns, but spaces was messing it up also.
As I said the XML Parser counts spaces as valid nodes... thats why normalize should be used..
or a pre parser which takes out the spaces & linefeeds from the source.
ASP.NET is much better this regards..:-)
As I said the XML Parser counts spaces as valid nodes... thats why normalize should be used..
or a pre parser which takes out the spaces & linefeeds from the source.
ASP.NET is much better this regards..:-)
I guess as per the DOM specification, it is actually supposed to count them :)
ASKER
Well a simple regex replace with the whitespace char should fix it.
normalize() was not removing the white space...at least with my implementation.
I was going to try Xerces but it looks a little to bloated for what I need.
normalize() was not removing the white space...at least with my implementation.
I was going to try Xerces but it looks a little to bloated for what I need.
ASKER
Tried doc.normalize() but still printed out the same results. Perhaps this is an encoding issue? I am getting this file out of a zip file using java objects (ZipFile, ZipItem, etc..)