Use DOM and a custom entity resolver to ignore the dtd
http://forums.sun.com/thre
Main Topics
Browse All TopicsHI,
I am new to Java and XML, I am given 2 xml files and I have to report the difference between the two. These xml files are encoded in UTF-8.
Here are some general qeustions that I have:
1. In Java, SAX and DOM are available. Which one should be used?
2. Is there anyway that this parsers ignore the DTD?
Thanks,
coder-rl
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Use DOM and a custom entity resolver to ignore the dtd
http://forums.sun.com/thre
i suggest stax api instead of sax or dom. Both sax and dom make you build some structure anyway.
With stax you can traverse the two xml file step by step and make comparion on each step,
take a look at for a quick start
http://www.vogella.de/arti
Hi CEHJ, objects, and pesmerg,
I have downloaded the XMLDIFF from IBM as instructed and I will try it fairly soon. At the moment, I am taking on the challenge of develop my own. Since I am new to both Java and XML, I know I will appreciate what XMLDIFF can do.
Currently, I have some questions about Java document builder as indicated below in the code. Please help.
Thanks,
coder-rl
The question:
I am reading a xml file enoded in UTF-8, note the Spanish word preparation.
When I parse the file directly with doc = builder.parse(<the file>), I get the 2 of 2-byte UTF-8 sequence error.
When I pass in an inputsource from streamreader of a file input stream by specifying UTF-8, my output shows unreadable character in the Spanish word.
When I pass in an inputsource from streamreader of a file input stream by without specifying UTF-8, my output shows readable character in the Spanish word.
May I learn why is that?
The data file content:
<?xml version="1.0" encoding="utf-8"?>
<PHONEBOOK>
<PERSON>
<NAME >Joe PREPARACIÓN Yin</NAME>
<EMAIL property="working">joe@your
<TELEPHONE>202-999-9999</TELE
<WEB>www.java2s.com</WEB>
</PERSON>
<PERSON>
<NAME>Karol</NAME>
<EMAIL>karol@yourserver.com</
<TELEPHONE>306-999-9999</TELE
<WEB>www.java2s.com</WEB>
</PERSON>
<PERSON>
<NAME>Green</NAME>
<EMAIL>green@yourserver.com</
<TELEPHONE>202-414-9999</TELE
<WEB>www.java2s.com</WEB>
</PERSON>
</PHONEBOOK>
The code:
import java.io.IOException;
import java.io.StringReader;
import java.io.*;
import javax.xml.parsers.Document
import javax.xml.parsers.Document
import javax.xml.parsers.ParserCo
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseExcept
public class XMLReader {
static public void main(String[] args) {
String encodingFormat = "UTF8";
String logfileName1 = "check.log";
boolean validate = false;
DocumentBuilderFactory dbf = DocumentBuilderFactory.new
dbf.setValidating(validate
dbf.setNamespaceAware(true
dbf.setIgnoringElementCont
Document doc = null;
try {
// ==========================
// Read the input file into a DOM.
// ==========================
DocumentBuilder builder = dbf.newDocumentBuilder();
// this is for test string stream.
// doc = builder.parse(new InputSource(new StringReader(xmlString_A))
// doc = builder.parse(new InputSource(new StringReader(xmlString_B))
// this will have the 2 of 2-byte UTF-8 sequence error.
// doc = builder.parse(args[0]);
// this will give string character output with bufferedwriter, specified UTF-8 or not.
doc = builder.parse(new InputSource(new InputStreamReader(
new FileInputStream(args[0]), encodingFormat)));
// doc = builder.parse(new InputSource(new InputStreamReader(
// new FileInputStream(args[0])))
// ==========================
// Write the output file with Bufferedwriter.
// ==========================
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(logfileNa
// Writer out = new BufferedWriter(new OutputStreamWriter(
// new FileOutputStream(logfileNa
out.write("Reporting Started!\r\n");
TreeDumper td = new TreeDumper();
td.dump(doc, out);
// Close the BufferedWriter
out.write("Reporting Done!");
if (out != null) {
out.flush();
}
out.close();
} catch (SAXException e) {
System.exit(1);
} catch (ParserConfigurationExcept
System.err.println(e);
System.exit(1);
} catch (IOException e) {
System.err.println(e);
System.exit(1);
}
// TreeDumper td = new TreeDumper();
// td.dump(doc, out);
}
static String xmlString_A = "<PHONEBOOK>" +
" <PERSON>" +
" <NAME >Joe PREPARACIÓN Yin</NAME>" +
" <EMAIL property=\"working\">joe@yo
" <TELEPHONE>202-999-9999</TELE
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" <PERSON> " +
"<NAME>Karol</NAME>" +
" <EMAIL>karol@yourserver.com</
" <TELEPHONE>306-999-9999</TELE
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" <PERSON>" +
" <NAME>Green</NAME>" +
" <EMAIL>green@yourserver.com</
" <TELEPHONE>202-414-9999</TELE
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" </PHONEBOOK>";
static String xmlString_B = "<PHONEBOOK>" +
" <PERSON>" +
" <NAME >Joe Yin</NAME>" +
" <EMAIL property=\"working\">joe@yo
" <TELEPHONE>202-999-9999</TELE
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" <PERSON> " +
"<NAME>Karol</NAME>" +
" <EMAIL>karol@yourserver.com</
" <TELEPHONE>306-999-9999</TELE
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" <PERSON>" +
" <NAME>Green</NAME>" +
" <EMAIL>green@yourserver.com</
" <TELEPHONE>202-414-9999</TELE
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" </PHONEBOOK>";
}
class TreeDumper {
public void dump(Document doc, Writer out) {
try {
System.out.println("Root element = " + doc.getDocumentElement().g
out.write("Root element = " + doc.getDocumentElement().g
dumpLoop((Node)doc,"",out)
} catch (IOException e) {
System.err.println(e);
System.exit(1);
}
}
private void dumpLoop(Node node, String indent, Writer out) {
try {
switch(node.getNodeType())
case Node.CDATA_SECTION_NODE:
System.out.println(indent + "CDATA_SECTION_NODE");
break;
case Node.COMMENT_NODE:
System.out.println(indent + "COMMENT_NODE");
break;
case Node.DOCUMENT_FRAGMENT_NOD
System.out.println(indent + "DOCUMENT_FRAGMENT_NODE");
break;
case Node.DOCUMENT_NODE:
System.out.println(indent + "DOCUMENT_NODE" + " --- [node name] = " + node.getNodeName());
break;
case Node.DOCUMENT_TYPE_NODE:
System.out.println(indent + "DOCUMENT_TYPE_NODE");
break;
case Node.ELEMENT_NODE:
System.out.println(indent + "ELEMENT_NODE" + " --- [node name] = " + node.getNodeName());
out.write(indent + "ELEMENT_NODE" + " --- [node name] = " + node.getNodeName() + "\r\n");
break;
case Node.ENTITY_NODE:
System.out.println(indent + "ENTITY_NODE");
break;
case Node.ENTITY_REFERENCE_NODE
System.out.println(indent + "ENTITY_REFERENCE_NODE");
break;
case Node.NOTATION_NODE:
System.out.println(indent + "NOTATION_NODE");
break;
case Node.PROCESSING_INSTRUCTIO
System.out.println(indent + "PROCESSING_INSTRUCTION_NO
break;
case Node.TEXT_NODE:
// System.out.println(indent + "TEXT_NODE" + " = " + node.getNodeValue());
System.out.println(indent + "TEXT_NODE" + " --- [node name] = " + node.getNodeName() + " [node value] = " + node.getTextContent());
out.write(indent + "TEXT_NODE" + " --- [node name] = " + node.getNodeName() + " [node value] = " + node.getTextContent() + "\r\n");
break;
default:
System.out.println(indent + "Unknown node");
break;
}
NodeList list = node.getChildNodes();
for(int i=0; i<list.getLength(); i++)
dumpLoop(list.item(i), indent + " ", out);
} catch (IOException e) {
System.err.println(e);
System.exit(1);
}
}
}
Business Accounts
Answer for Membership
by: CEHJPosted on 2008-08-08 at 13:39:52ID: 22193462
You need an xml diff app. Try
com/tech/x mldiffmerg e
http://www.alphaworks.ibm.