Solved

Validate a XML document using DTD

Posted on 2002-05-22
32
645 Views
Last Modified: 2013-11-23
Hi,

I have a question about using DTD to validate an XML data feed. Here is what I need to do.

1. I need to retrieve an XML file from a website, say http://ABCD.COM/sample.xml.

2. This XML is well formatted per a DTD. This DTD is defined external at the same website. Here is the sample of the XML file:

<?xml version="1.0" encoding="ISO8859-1"?>
<!DOCTYPE index SYSTEM "/dtds/format1.dtd">
<Foo>
    <Foo1>
    </Foo1>
</Foo>

3. When I retrieve the XML file(Or after I retrieve the XML file), I need to checked whether it is valid or not per the DTD file.


Any idea ? Sample code would be really appreciated.

Thanks.


0
Comment
Question by:mikechen
  • 16
  • 9
  • 6
  • +1
32 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 7026521
 Take a look here: http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/sax/index.html

  At the bottom of the page it has links to various things you can do with java and XML.

  Hope it helps.
0
 

Expert Comment

by:MikaelHK
ID: 7026650
First of all:

Use the URL class in the java.net package.
It allows you to get an InputStream for your resource(remember to wrap it in a BufferedInputStream for good measure).

Now to parse the xml:

Use the javax.xml.parsers package (JAXP) to have a factory create a validating parser for you. If all you want is to know if there are any errors in the document simply make a SAXParser and pass it an implementation of the SAX2 interface DefaultHander extending the error, fatalError and warning methods to get the error information. Then if it doesn't do enough and you aren't down with SAX I suggest you get a hold of a DOM parser which is far easier to use (it represent your document as a tree of nodes), but it is also memory and processing expensive in comparison to the SAX implementation

According to the sample xml you have on the page your xml contains an DOCTYPE with a SYSTEM reference. This requires that the DTD is present in the parsing system (filesystem). If possible you should change this either to a PUBLIC "http://abcd.com/dtds/format1.dtd" which will allow the parser to go and load the DTD from the server (In fact some parser are now smart enough to cache these DTD for later reuse).

Hope it was helpful.
0
 
LVL 35

Expert Comment

by:girionis
ID: 7026661
 Mikael please do not propose answers as this locks the question and it is difficult for other peopel to see it and add their comments. Propose comments instead as comments can still be accepted as answers.
0
 

Author Comment

by:mikechen
ID: 7027617
Thanks for the responses. I guess that is the data feed from a certain website and I could not change it.

So is there way to achieve what I want ?

Could somebody provide some sample code ?

Thanks.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7038288
Mike,

Yes, there is probably a way to achieve what you want, and it's not too difficult. However, it depends on properly defining the path to the DTD in the XML document. The external DTD subset ("/dtds/format1.dtd") is resolved in the context of the document entity ("http://abcd.com/sample.xml"), so your DTD must be accessible at http://abcd.com/dtds/format1.dtd.

Here's some sample code using JAXP. Note, however, that Crimson (the default parser with JDK 1.4) has a bug causing it to incorrectly resolve the DTD URL. You'll need to get a different validating parser such as Xerces (http://xml.apache.org/xerces2-j).

import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class validate {
    static public void main(String[] args) {
        try {
            SAXParserFactory saxfactory = SAXParserFactory.newInstance();
            saxfactory.setValidating(true);
            SAXParser saxparser = saxfactory.newSAXParser();
   
            if (args.length < 1) {
                System.err.println(
                    "Usage: java validate http://abcd.com/sample.xml");
                System.exit(1);
            }

            saxparser.parse(args[0],(DefaultHandler)null);
            System.out.println("File is valid");
        }
        catch (SAXException e) {
            System.out.println("File is not valid: " + e.getMessage());
        }
        catch (Exception e) {
            System.out.println("Error parsing document:");
            e.printStackTrace();
        }
    }
}
0
 

Author Comment

by:mikechen
ID: 7058715
Hi, here is what I plan to do.

1. Get the XML file.
2. Replace <!DOCTYPE index SYSTEM "/dtds/format1.dtd"> with <!DOCTYPE index SYSTEM "http://abcd.com/dtds/format1.dtd">
3. Parse it.

Do you think this is good enough ?

But here is what I need help since I am still a C++/C# programmer

1. To get XML file. Here is what I did.

<<
DocumentBuilderFactory docBuilderFactory;
docBuilderFactory = DocumentBuilderFactory.newInstance();
docBuilderFactory.setValidating(false);

DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document xmlDoc = docBuilder.parse(uri);
...
>>



But I got error like this
<<
Exception in thread "main" java.lang.InternalError
        at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2636)
        at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
        at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)

        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)

        at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:179)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:134)
>>

Why I got this error ?

2. Since I got the xmlDoc, I assume I can manipulate it and change the DTD. What is the best way ?

3. After I change the DTD to absolute path, how should I parse it again ?


Thanks a lot.


0
 

Author Comment

by:mikechen
ID: 7058722
BTW, I am using JDK 1.4.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7058760
Mike, your code isn't working because it's trying to parse the DTD before you change it. I can tell you how to fix it, but it shouldn't be necessary. As I stated in my previous comment, your XML file and the system identifier for the DTD external subset are valid. You're probably having problems just because of the Crimson bug. Just use a different parser and try the sample code I posted. Let us know if it doesn't work.
0
 

Author Comment

by:mikechen
ID: 7059706
Hi, One question here.

When I call docBuilderFactory.setValidating(false);
is it still going to validate the xml against the dtd ?

If not, then why did I still get that error ?

Thanks.
0
 
LVL 35

Expert Comment

by:girionis
ID: 7059748
 No it should not validate it...

  Getting this error means that there might be a problem with your parser, or, even with your Servlet engine. Tomcat 4.0 is known to have such problems. What Servlet Engine are you using?
0
 
LVL 7

Expert Comment

by:yoren
ID: 7060139
If you have copied your XML document elsewhere, then the problem is that the DTD external subset is specified but doesn't exist. That's not only a validation error, that's a well-formedness error too. Parsers check for well-formedness even when validation is off.

If you really need to save the document elsewhere and rewrite the DTD, consider using SAX2. The SAX2 API has two features, "external-general-entities" and "external-parameter-entities" that allow you to skip external entities. You can use the default Crimson parser, but you'll lose the comments in your document. If you want to preserve comments you should use a parser that supports the SAX2 extensions. Piccolo (http://piccolo.sourceforge.net), Xerces, and a few others (check http://www.saxproject.org/?selected=links) will work fine.
0
 

Author Comment

by:mikechen
ID: 7061203
HI, Yoren,

Can you post the sample code using Xerces.(http://xml.apache.org/xerces2-j).

Thanks.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061252
Mike,

One of the great things about JAXP and SAX is that you can switch parsers without changing any code. You can use the code listed in my previous comment.

To have the program use Xerces instead of the default Crimson parser, you'll need to:

1. Download Xerces and place the .jar files in your [JAVA_HOME]/jre/lib/ext directory.

2. Tell Java to use Xerces as the default parser by creating the file, [JAVA_HOME]/jre/lib/jaxp.properties, and putting in these two lines:

javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl
javax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
0
 

Author Comment

by:mikechen
ID: 7061273
Hi, Yoren,

I followed what you said, but I still got the error like
It seems it is still using crimson.

Any idea ?

Thanks.

<<
        at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2636)
        at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
        at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)

        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)

        at javax.xml.parsers.SAXParser.parse(SAXParser.java:346)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:232)
        at validate.main(validate.java:18)
>>
0
 

Author Comment

by:mikechen
ID: 7061287
Hi, Yoren,

I followed what you said, but I still got the error like
It seems it is still using crimson.

Any idea ?

Thanks.

<<
        at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2636)
        at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
        at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)

        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)

        at javax.xml.parsers.SAXParser.parse(SAXParser.java:346)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:232)
        at validate.main(validate.java:18)
>>
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061308
Try specifying the parser on the command line:

java -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  validate http://abcd.com/sample.xml
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:mikechen
ID: 7061336
I got an error
File is not valid: The encoding "ISO8859-1" is not supported.
0
 

Author Comment

by:mikechen
ID: 7061337
I got an error
File is not valid: The encoding "ISO8859-1" is not supported.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061343
Ah, I didn't see that typo before. That's not a valid encoding. Instead, it should be "ISO-8859-1".
0
 

Author Comment

by:mikechen
ID: 7061344
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061346
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061349
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061350
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061351
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 
LVL 7

Expert Comment

by:yoren
ID: 7061359
Did you ever hear the joke about the guy who walks into a doctor's office and said "it hurts when I raise my arm like this?" Well, like the doctor told him,

Don't do that.
0
 
LVL 7

Accepted Solution

by:
yoren earned 300 total points
ID: 7061360
P.S. After we get this working for you, please reject the pending answer and accept one of my comments as the answer. Thanks!
0
 
LVL 35

Expert Comment

by:girionis
ID: 7061382
>Did you ever hear the joke about the guy who walks into a doctor's office and said "it hurts when I
>raise my arm like this?" Well, like the doctor told him,
>
>Don't do that.

  LOL. :-)

  mikechen the java.lang.NoClassDefFoundError means that the VM cannot find the class you want. Make sure that the path to this class (or the jar file) is in the classpath. If you have put the jar file inside the /ext folder then the VM should pick it up automatically. If you still have the NoClassDefFoundError then make sure that the class you are looking for is in the correct jar file.

  Sometimes you even have to restart your server.

  Hope it helps :-)
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061419
The NoClassDefFoundError is occurring because of the space after the -D flag. It thinks you're trying to run a program called "javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl " instead of setting the system property. Remove the space to fix the problem.
0
 
LVL 35

Expert Comment

by:girionis
ID: 7061447
 Aha.. I see. To be honest I was never in the need to set a new system property dynamically. Thanks for the information.
0
 

Author Comment

by:mikechen
ID: 7065463
OK, I am back from some chaos.

Hi, Yoren,
How do I fix
"File is not valid: The encoding "ISO8859-1" is not supported." ?

0
 
LVL 35

Expert Comment

by:girionis
ID: 7065506
 Change this: "ISO8859-1" to this: "ISO-8859-1"

  Hope it helps.
0
 

Author Comment

by:mikechen
ID: 7075955
Thanks for your help, MikaelHK.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
endX challenge 2 48
strCount chalenge 3 50
JQuery tracking event. 3 38
get weblogic logged in user in java 2 40
What is Node.js? Node.js is a server side scripting language much like PHP or ASP but is used to implement the complete package of HTTP webserver and application framework. The difference is that Node.js’s execution engine is asynchronous and event…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
The viewer will learn how to count occurrences of each item in an array.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now