mikechen
asked on
Validate a XML document using DTD
Hi,
I have a question about using DTD to validate an XML data feed. Here is what I need to do.
1. I need to retrieve an XML file from a website, say http://ABCD.COM/sample.xml.
2. This XML is well formatted per a DTD. This DTD is defined external at the same website. Here is the sample of the XML file:
<?xml version="1.0" encoding="ISO8859-1"?>
<!DOCTYPE index SYSTEM "/dtds/format1.dtd">
<Foo>
<Foo1>
</Foo1>
</Foo>
3. When I retrieve the XML file(Or after I retrieve the XML file), I need to checked whether it is valid or not per the DTD file.
Any idea ? Sample code would be really appreciated.
Thanks.
I have a question about using DTD to validate an XML data feed. Here is what I need to do.
1. I need to retrieve an XML file from a website, say http://ABCD.COM/sample.xml.
2. This XML is well formatted per a DTD. This DTD is defined external at the same website. Here is the sample of the XML file:
<?xml version="1.0" encoding="ISO8859-1"?>
<!DOCTYPE index SYSTEM "/dtds/format1.dtd">
<Foo>
<Foo1>
</Foo1>
</Foo>
3. When I retrieve the XML file(Or after I retrieve the XML file), I need to checked whether it is valid or not per the DTD file.
Any idea ? Sample code would be really appreciated.
Thanks.
First of all:
Use the URL class in the java.net package.
It allows you to get an InputStream for your resource(remember to wrap it in a BufferedInputStream for good measure).
Now to parse the xml:
Use the javax.xml.parsers package (JAXP) to have a factory create a validating parser for you. If all you want is to know if there are any errors in the document simply make a SAXParser and pass it an implementation of the SAX2 interface DefaultHander extending the error, fatalError and warning methods to get the error information. Then if it doesn't do enough and you aren't down with SAX I suggest you get a hold of a DOM parser which is far easier to use (it represent your document as a tree of nodes), but it is also memory and processing expensive in comparison to the SAX implementation
According to the sample xml you have on the page your xml contains an DOCTYPE with a SYSTEM reference. This requires that the DTD is present in the parsing system (filesystem). If possible you should change this either to a PUBLIC "http://abcd.com/dtds/format1.dtd" which will allow the parser to go and load the DTD from the server (In fact some parser are now smart enough to cache these DTD for later reuse).
Hope it was helpful.
Use the URL class in the java.net package.
It allows you to get an InputStream for your resource(remember to wrap it in a BufferedInputStream for good measure).
Now to parse the xml:
Use the javax.xml.parsers package (JAXP) to have a factory create a validating parser for you. If all you want is to know if there are any errors in the document simply make a SAXParser and pass it an implementation of the SAX2 interface DefaultHander extending the error, fatalError and warning methods to get the error information. Then if it doesn't do enough and you aren't down with SAX I suggest you get a hold of a DOM parser which is far easier to use (it represent your document as a tree of nodes), but it is also memory and processing expensive in comparison to the SAX implementation
According to the sample xml you have on the page your xml contains an DOCTYPE with a SYSTEM reference. This requires that the DTD is present in the parsing system (filesystem). If possible you should change this either to a PUBLIC "http://abcd.com/dtds/format1.dtd" which will allow the parser to go and load the DTD from the server (In fact some parser are now smart enough to cache these DTD for later reuse).
Hope it was helpful.
Mikael please do not propose answers as this locks the question and it is difficult for other peopel to see it and add their comments. Propose comments instead as comments can still be accepted as answers.
ASKER
Thanks for the responses. I guess that is the data feed from a certain website and I could not change it.
So is there way to achieve what I want ?
Could somebody provide some sample code ?
Thanks.
So is there way to achieve what I want ?
Could somebody provide some sample code ?
Thanks.
Mike,
Yes, there is probably a way to achieve what you want, and it's not too difficult. However, it depends on properly defining the path to the DTD in the XML document. The external DTD subset ("/dtds/format1.dtd") is resolved in the context of the document entity ("http://abcd.com/sample.xml"), so your DTD must be accessible at http://abcd.com/dtds/format1.dtd.
Here's some sample code using JAXP. Note, however, that Crimson (the default parser with JDK 1.4) has a bug causing it to incorrectly resolve the DTD URL. You'll need to get a different validating parser such as Xerces (http://xml.apache.org/xerces2-j).
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class validate {
static public void main(String[] args) {
try {
SAXParserFactory saxfactory = SAXParserFactory.newInstan ce();
saxfactory.setValidating(t rue);
SAXParser saxparser = saxfactory.newSAXParser();
if (args.length < 1) {
System.err.println(
"Usage: java validate http://abcd.com/sample.xml");
System.exit(1);
}
saxparser.parse(args[0],(D efaultHand ler)null);
System.out.println("File is valid");
}
catch (SAXException e) {
System.out.println("File is not valid: " + e.getMessage());
}
catch (Exception e) {
System.out.println("Error parsing document:");
e.printStackTrace();
}
}
}
Yes, there is probably a way to achieve what you want, and it's not too difficult. However, it depends on properly defining the path to the DTD in the XML document. The external DTD subset ("/dtds/format1.dtd") is resolved in the context of the document entity ("http://abcd.com/sample.xml"), so your DTD must be accessible at http://abcd.com/dtds/format1.dtd.
Here's some sample code using JAXP. Note, however, that Crimson (the default parser with JDK 1.4) has a bug causing it to incorrectly resolve the DTD URL. You'll need to get a different validating parser such as Xerces (http://xml.apache.org/xerces2-j).
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class validate {
static public void main(String[] args) {
try {
SAXParserFactory saxfactory = SAXParserFactory.newInstan
saxfactory.setValidating(t
SAXParser saxparser = saxfactory.newSAXParser();
if (args.length < 1) {
System.err.println(
"Usage: java validate http://abcd.com/sample.xml");
System.exit(1);
}
saxparser.parse(args[0],(D
System.out.println("File is valid");
}
catch (SAXException e) {
System.out.println("File is not valid: " + e.getMessage());
}
catch (Exception e) {
System.out.println("Error parsing document:");
e.printStackTrace();
}
}
}
ASKER
Hi, here is what I plan to do.
1. Get the XML file.
2. Replace <!DOCTYPE index SYSTEM "/dtds/format1.dtd"> with <!DOCTYPE index SYSTEM "http://abcd.com/dtds/format1.dtd">
3. Parse it.
Do you think this is good enough ?
But here is what I need help since I am still a C++/C# programmer
1. To get XML file. Here is what I did.
<<
DocumentBuilderFactory docBuilderFactory;
docBuilderFactory = DocumentBuilderFactory.new Instance() ;
docBuilderFactory.setValid ating(fals e);
DocumentBuilder docBuilder = docBuilderFactory.newDocum entBuilder ();
Document xmlDoc = docBuilder.parse(uri);
...
>>
But I got error like this
<<
Exception in thread "main" java.lang.InternalError
at org.apache.crimson.parser. Parser2.pa rseSystemI d(Parser2. java:2636)
at org.apache.crimson.parser. Parser2.ma ybeExterna lID(Parser 2.java:260 5)
at org.apache.crimson.parser. Parser2.ma ybeDoctype Decl(Parse r2.java:11 16)
at org.apache.crimson.parser. Parser2.pa rseInterna l(Parser2. java:488)
at org.apache.crimson.parser. Parser2.pa rse(Parser 2.java:304 )
at org.apache.crimson.parser. XMLReaderI mpl.parse( XMLReaderI mpl.java:4 33)
at org.apache.crimson.jaxp.Do cumentBuil derImpl.pa rse(Docume ntBuilderI mpl.java:1 79)
at javax.xml.parsers.Document Builder.pa rse(Docume ntBuilder. java:134)
>>
Why I got this error ?
2. Since I got the xmlDoc, I assume I can manipulate it and change the DTD. What is the best way ?
3. After I change the DTD to absolute path, how should I parse it again ?
Thanks a lot.
1. Get the XML file.
2. Replace <!DOCTYPE index SYSTEM "/dtds/format1.dtd"> with <!DOCTYPE index SYSTEM "http://abcd.com/dtds/format1.dtd">
3. Parse it.
Do you think this is good enough ?
But here is what I need help since I am still a C++/C# programmer
1. To get XML file. Here is what I did.
<<
DocumentBuilderFactory docBuilderFactory;
docBuilderFactory = DocumentBuilderFactory.new
docBuilderFactory.setValid
DocumentBuilder docBuilder = docBuilderFactory.newDocum
Document xmlDoc = docBuilder.parse(uri);
...
>>
But I got error like this
<<
Exception in thread "main" java.lang.InternalError
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.jaxp.Do
at javax.xml.parsers.Document
>>
Why I got this error ?
2. Since I got the xmlDoc, I assume I can manipulate it and change the DTD. What is the best way ?
3. After I change the DTD to absolute path, how should I parse it again ?
Thanks a lot.
ASKER
BTW, I am using JDK 1.4.
Mike, your code isn't working because it's trying to parse the DTD before you change it. I can tell you how to fix it, but it shouldn't be necessary. As I stated in my previous comment, your XML file and the system identifier for the DTD external subset are valid. You're probably having problems just because of the Crimson bug. Just use a different parser and try the sample code I posted. Let us know if it doesn't work.
ASKER
Hi, One question here.
When I call docBuilderFactory.setValid ating(fals e);
is it still going to validate the xml against the dtd ?
If not, then why did I still get that error ?
Thanks.
When I call docBuilderFactory.setValid
is it still going to validate the xml against the dtd ?
If not, then why did I still get that error ?
Thanks.
No it should not validate it...
Getting this error means that there might be a problem with your parser, or, even with your Servlet engine. Tomcat 4.0 is known to have such problems. What Servlet Engine are you using?
Getting this error means that there might be a problem with your parser, or, even with your Servlet engine. Tomcat 4.0 is known to have such problems. What Servlet Engine are you using?
If you have copied your XML document elsewhere, then the problem is that the DTD external subset is specified but doesn't exist. That's not only a validation error, that's a well-formedness error too. Parsers check for well-formedness even when validation is off.
If you really need to save the document elsewhere and rewrite the DTD, consider using SAX2. The SAX2 API has two features, "external-general-entities " and "external-parameter-entiti es" that allow you to skip external entities. You can use the default Crimson parser, but you'll lose the comments in your document. If you want to preserve comments you should use a parser that supports the SAX2 extensions. Piccolo (http://piccolo.sourceforge.net), Xerces, and a few others (check http://www.saxproject.org/?selected=links) will work fine.
If you really need to save the document elsewhere and rewrite the DTD, consider using SAX2. The SAX2 API has two features, "external-general-entities
ASKER
Mike,
One of the great things about JAXP and SAX is that you can switch parsers without changing any code. You can use the code listed in my previous comment.
To have the program use Xerces instead of the default Crimson parser, you'll need to:
1. Download Xerces and place the .jar files in your [JAVA_HOME]/jre/lib/ext directory.
2. Tell Java to use Xerces as the default parser by creating the file, [JAVA_HOME]/jre/lib/jaxp.p roperties, and putting in these two lines:
javax.xml.parsers.SAXParse rFactory=o rg.apache. xerces.jax p.SAXParse rFactoryIm pl
javax.xml.parsers.Document BuilderFac tory=org.a pache.xerc es.jaxp.Do cumentBuil derFactory Impl
One of the great things about JAXP and SAX is that you can switch parsers without changing any code. You can use the code listed in my previous comment.
To have the program use Xerces instead of the default Crimson parser, you'll need to:
1. Download Xerces and place the .jar files in your [JAVA_HOME]/jre/lib/ext directory.
2. Tell Java to use Xerces as the default parser by creating the file, [JAVA_HOME]/jre/lib/jaxp.p
javax.xml.parsers.SAXParse
javax.xml.parsers.Document
ASKER
Hi, Yoren,
I followed what you said, but I still got the error like
It seems it is still using crimson.
Any idea ?
Thanks.
<<
at org.apache.crimson.parser. Parser2.pa rseSystemI d(Parser2. java:2636)
at org.apache.crimson.parser. Parser2.ma ybeExterna lID(Parser 2.java:260 5)
at org.apache.crimson.parser. Parser2.ma ybeDoctype Decl(Parse r2.java:11 16)
at org.apache.crimson.parser. Parser2.pa rseInterna l(Parser2. java:488)
at org.apache.crimson.parser. Parser2.pa rse(Parser 2.java:304 )
at org.apache.crimson.parser. XMLReaderI mpl.parse( XMLReaderI mpl.java:4 33)
at javax.xml.parsers.SAXParse r.parse(SA XParser.ja va:346)
at javax.xml.parsers.SAXParse r.parse(SA XParser.ja va:232)
at validate.main(validate.jav a:18)
>>
I followed what you said, but I still got the error like
It seems it is still using crimson.
Any idea ?
Thanks.
<<
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at javax.xml.parsers.SAXParse
at javax.xml.parsers.SAXParse
at validate.main(validate.jav
>>
ASKER
Hi, Yoren,
I followed what you said, but I still got the error like
It seems it is still using crimson.
Any idea ?
Thanks.
<<
at org.apache.crimson.parser. Parser2.pa rseSystemI d(Parser2. java:2636)
at org.apache.crimson.parser. Parser2.ma ybeExterna lID(Parser 2.java:260 5)
at org.apache.crimson.parser. Parser2.ma ybeDoctype Decl(Parse r2.java:11 16)
at org.apache.crimson.parser. Parser2.pa rseInterna l(Parser2. java:488)
at org.apache.crimson.parser. Parser2.pa rse(Parser 2.java:304 )
at org.apache.crimson.parser. XMLReaderI mpl.parse( XMLReaderI mpl.java:4 33)
at javax.xml.parsers.SAXParse r.parse(SA XParser.ja va:346)
at javax.xml.parsers.SAXParse r.parse(SA XParser.ja va:232)
at validate.main(validate.jav a:18)
>>
I followed what you said, but I still got the error like
It seems it is still using crimson.
Any idea ?
Thanks.
<<
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at org.apache.crimson.parser.
at javax.xml.parsers.SAXParse
at javax.xml.parsers.SAXParse
at validate.main(validate.jav
>>
Try specifying the parser on the command line:
java -Djavax.xml.parsers.SAXPar serFactory =org.apach e.xerces.j axp.SAXPar serFactory Impl validate http://abcd.com/sample.xml
java -Djavax.xml.parsers.SAXPar
ASKER
I got an error
File is not valid: The encoding "ISO8859-1" is not supported.
File is not valid: The encoding "ISO8859-1" is not supported.
ASKER
I got an error
File is not valid: The encoding "ISO8859-1" is not supported.
File is not valid: The encoding "ISO8859-1" is not supported.
Ah, I didn't see that typo before. That's not a valid encoding. Instead, it should be "ISO-8859-1".
ASKER
I put a space between -D and javax.xml.parsers.SAXParse rFactory=o rg.apache. xerces.jax p.SAXParse rFactoryIm pl then I got an error
Exception in thread "main" java.lang.NoClassDefFoundE rror: javax/xml/parsers/SAXParse rFactory=o rg/apache/ xerces/jax p/SAXParse rFactoryIm pl
Exception in thread "main" java.lang.NoClassDefFoundE
ASKER
I put a space between -D and javax.xml.parsers.SAXParse rFactory=o rg.apache. xerces.jax p.SAXParse rFactoryIm pl then I got an error
Exception in thread "main" java.lang.NoClassDefFoundE rror: javax/xml/parsers/SAXParse rFactory=o rg/apache/ xerces/jax p/SAXParse rFactoryIm pl
Exception in thread "main" java.lang.NoClassDefFoundE
ASKER
I put a space between -D and javax.xml.parsers.SAXParse rFactory=o rg.apache. xerces.jax p.SAXParse rFactoryIm pl then I got an error
Exception in thread "main" java.lang.NoClassDefFoundE rror: javax/xml/parsers/SAXParse rFactory=o rg/apache/ xerces/jax p/SAXParse rFactoryIm pl
Exception in thread "main" java.lang.NoClassDefFoundE
ASKER
I put a space between -D and javax.xml.parsers.SAXParse rFactory=o rg.apache. xerces.jax p.SAXParse rFactoryIm pl then I got an error
Exception in thread "main" java.lang.NoClassDefFoundE rror: javax/xml/parsers/SAXParse rFactory=o rg/apache/ xerces/jax p/SAXParse rFactoryIm pl
Exception in thread "main" java.lang.NoClassDefFoundE
ASKER
I put a space between -D and javax.xml.parsers.SAXParse rFactory=o rg.apache. xerces.jax p.SAXParse rFactoryIm pl then I got an error
Exception in thread "main" java.lang.NoClassDefFoundE rror: javax/xml/parsers/SAXParse rFactory=o rg/apache/ xerces/jax p/SAXParse rFactoryIm pl
Exception in thread "main" java.lang.NoClassDefFoundE
Did you ever hear the joke about the guy who walks into a doctor's office and said "it hurts when I raise my arm like this?" Well, like the doctor told him,
Don't do that.
Don't do that.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>Did you ever hear the joke about the guy who walks into a doctor's office and said "it hurts when I
>raise my arm like this?" Well, like the doctor told him,
>
>Don't do that.
LOL. :-)
mikechen the java.lang.NoClassDefFoundE rror means that the VM cannot find the class you want. Make sure that the path to this class (or the jar file) is in the classpath. If you have put the jar file inside the /ext folder then the VM should pick it up automatically. If you still have the NoClassDefFoundError then make sure that the class you are looking for is in the correct jar file.
Sometimes you even have to restart your server.
Hope it helps :-)
>raise my arm like this?" Well, like the doctor told him,
>
>Don't do that.
LOL. :-)
mikechen the java.lang.NoClassDefFoundE
Sometimes you even have to restart your server.
Hope it helps :-)
The NoClassDefFoundError is occurring because of the space after the -D flag. It thinks you're trying to run a program called "javax.xml.parsers.SAXPars erFactory= org.apache .xerces.ja xp.SAXPars erFactoryI mpl " instead of setting the system property. Remove the space to fix the problem.
Aha.. I see. To be honest I was never in the need to set a new system property dynamically. Thanks for the information.
ASKER
OK, I am back from some chaos.
Hi, Yoren,
How do I fix
"File is not valid: The encoding "ISO8859-1" is not supported." ?
Hi, Yoren,
How do I fix
"File is not valid: The encoding "ISO8859-1" is not supported." ?
Change this: "ISO8859-1" to this: "ISO-8859-1"
Hope it helps.
Hope it helps.
ASKER
Thanks for your help, MikaelHK.
At the bottom of the page it has links to various things you can do with java and XML.
Hope it helps.