Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Validate a XML document using DTD

Posted on 2002-05-22
32
Medium Priority
?
672 Views
Last Modified: 2013-11-23
Hi,

I have a question about using DTD to validate an XML data feed. Here is what I need to do.

1. I need to retrieve an XML file from a website, say http://ABCD.COM/sample.xml.

2. This XML is well formatted per a DTD. This DTD is defined external at the same website. Here is the sample of the XML file:

<?xml version="1.0" encoding="ISO8859-1"?>
<!DOCTYPE index SYSTEM "/dtds/format1.dtd">
<Foo>
    <Foo1>
    </Foo1>
</Foo>

3. When I retrieve the XML file(Or after I retrieve the XML file), I need to checked whether it is valid or not per the DTD file.


Any idea ? Sample code would be really appreciated.

Thanks.


0
Comment
Question by:mikechen
  • 16
  • 9
  • 6
  • +1
32 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 7026521
 Take a look here: http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/sax/index.html

  At the bottom of the page it has links to various things you can do with java and XML.

  Hope it helps.
0
 

Expert Comment

by:MikaelHK
ID: 7026650
First of all:

Use the URL class in the java.net package.
It allows you to get an InputStream for your resource(remember to wrap it in a BufferedInputStream for good measure).

Now to parse the xml:

Use the javax.xml.parsers package (JAXP) to have a factory create a validating parser for you. If all you want is to know if there are any errors in the document simply make a SAXParser and pass it an implementation of the SAX2 interface DefaultHander extending the error, fatalError and warning methods to get the error information. Then if it doesn't do enough and you aren't down with SAX I suggest you get a hold of a DOM parser which is far easier to use (it represent your document as a tree of nodes), but it is also memory and processing expensive in comparison to the SAX implementation

According to the sample xml you have on the page your xml contains an DOCTYPE with a SYSTEM reference. This requires that the DTD is present in the parsing system (filesystem). If possible you should change this either to a PUBLIC "http://abcd.com/dtds/format1.dtd" which will allow the parser to go and load the DTD from the server (In fact some parser are now smart enough to cache these DTD for later reuse).

Hope it was helpful.
0
 
LVL 35

Expert Comment

by:girionis
ID: 7026661
 Mikael please do not propose answers as this locks the question and it is difficult for other peopel to see it and add their comments. Propose comments instead as comments can still be accepted as answers.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:mikechen
ID: 7027617
Thanks for the responses. I guess that is the data feed from a certain website and I could not change it.

So is there way to achieve what I want ?

Could somebody provide some sample code ?

Thanks.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7038288
Mike,

Yes, there is probably a way to achieve what you want, and it's not too difficult. However, it depends on properly defining the path to the DTD in the XML document. The external DTD subset ("/dtds/format1.dtd") is resolved in the context of the document entity ("http://abcd.com/sample.xml"), so your DTD must be accessible at http://abcd.com/dtds/format1.dtd.

Here's some sample code using JAXP. Note, however, that Crimson (the default parser with JDK 1.4) has a bug causing it to incorrectly resolve the DTD URL. You'll need to get a different validating parser such as Xerces (http://xml.apache.org/xerces2-j).

import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class validate {
    static public void main(String[] args) {
        try {
            SAXParserFactory saxfactory = SAXParserFactory.newInstance();
            saxfactory.setValidating(true);
            SAXParser saxparser = saxfactory.newSAXParser();
   
            if (args.length < 1) {
                System.err.println(
                    "Usage: java validate http://abcd.com/sample.xml");
                System.exit(1);
            }

            saxparser.parse(args[0],(DefaultHandler)null);
            System.out.println("File is valid");
        }
        catch (SAXException e) {
            System.out.println("File is not valid: " + e.getMessage());
        }
        catch (Exception e) {
            System.out.println("Error parsing document:");
            e.printStackTrace();
        }
    }
}
0
 

Author Comment

by:mikechen
ID: 7058715
Hi, here is what I plan to do.

1. Get the XML file.
2. Replace <!DOCTYPE index SYSTEM "/dtds/format1.dtd"> with <!DOCTYPE index SYSTEM "http://abcd.com/dtds/format1.dtd">
3. Parse it.

Do you think this is good enough ?

But here is what I need help since I am still a C++/C# programmer

1. To get XML file. Here is what I did.

<<
DocumentBuilderFactory docBuilderFactory;
docBuilderFactory = DocumentBuilderFactory.newInstance();
docBuilderFactory.setValidating(false);

DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document xmlDoc = docBuilder.parse(uri);
...
>>



But I got error like this
<<
Exception in thread "main" java.lang.InternalError
        at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2636)
        at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
        at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)

        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)

        at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:179)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:134)
>>

Why I got this error ?

2. Since I got the xmlDoc, I assume I can manipulate it and change the DTD. What is the best way ?

3. After I change the DTD to absolute path, how should I parse it again ?


Thanks a lot.


0
 

Author Comment

by:mikechen
ID: 7058722
BTW, I am using JDK 1.4.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7058760
Mike, your code isn't working because it's trying to parse the DTD before you change it. I can tell you how to fix it, but it shouldn't be necessary. As I stated in my previous comment, your XML file and the system identifier for the DTD external subset are valid. You're probably having problems just because of the Crimson bug. Just use a different parser and try the sample code I posted. Let us know if it doesn't work.
0
 

Author Comment

by:mikechen
ID: 7059706
Hi, One question here.

When I call docBuilderFactory.setValidating(false);
is it still going to validate the xml against the dtd ?

If not, then why did I still get that error ?

Thanks.
0
 
LVL 35

Expert Comment

by:girionis
ID: 7059748
 No it should not validate it...

  Getting this error means that there might be a problem with your parser, or, even with your Servlet engine. Tomcat 4.0 is known to have such problems. What Servlet Engine are you using?
0
 
LVL 7

Expert Comment

by:yoren
ID: 7060139
If you have copied your XML document elsewhere, then the problem is that the DTD external subset is specified but doesn't exist. That's not only a validation error, that's a well-formedness error too. Parsers check for well-formedness even when validation is off.

If you really need to save the document elsewhere and rewrite the DTD, consider using SAX2. The SAX2 API has two features, "external-general-entities" and "external-parameter-entities" that allow you to skip external entities. You can use the default Crimson parser, but you'll lose the comments in your document. If you want to preserve comments you should use a parser that supports the SAX2 extensions. Piccolo (http://piccolo.sourceforge.net), Xerces, and a few others (check http://www.saxproject.org/?selected=links) will work fine.
0
 

Author Comment

by:mikechen
ID: 7061203
HI, Yoren,

Can you post the sample code using Xerces.(http://xml.apache.org/xerces2-j).

Thanks.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061252
Mike,

One of the great things about JAXP and SAX is that you can switch parsers without changing any code. You can use the code listed in my previous comment.

To have the program use Xerces instead of the default Crimson parser, you'll need to:

1. Download Xerces and place the .jar files in your [JAVA_HOME]/jre/lib/ext directory.

2. Tell Java to use Xerces as the default parser by creating the file, [JAVA_HOME]/jre/lib/jaxp.properties, and putting in these two lines:

javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl
javax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
0
 

Author Comment

by:mikechen
ID: 7061273
Hi, Yoren,

I followed what you said, but I still got the error like
It seems it is still using crimson.

Any idea ?

Thanks.

<<
        at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2636)
        at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
        at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)

        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)

        at javax.xml.parsers.SAXParser.parse(SAXParser.java:346)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:232)
        at validate.main(validate.java:18)
>>
0
 

Author Comment

by:mikechen
ID: 7061287
Hi, Yoren,

I followed what you said, but I still got the error like
It seems it is still using crimson.

Any idea ?

Thanks.

<<
        at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2636)
        at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
        at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)

        at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
        at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
        at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)

        at javax.xml.parsers.SAXParser.parse(SAXParser.java:346)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:232)
        at validate.main(validate.java:18)
>>
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061308
Try specifying the parser on the command line:

java -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  validate http://abcd.com/sample.xml
0
 

Author Comment

by:mikechen
ID: 7061336
I got an error
File is not valid: The encoding "ISO8859-1" is not supported.
0
 

Author Comment

by:mikechen
ID: 7061337
I got an error
File is not valid: The encoding "ISO8859-1" is not supported.
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061343
Ah, I didn't see that typo before. That's not a valid encoding. Instead, it should be "ISO-8859-1".
0
 

Author Comment

by:mikechen
ID: 7061344
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061346
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061349
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061350
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 

Author Comment

by:mikechen
ID: 7061351
I put a space between -D and javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl  then I got an error

Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/parsers/SAXParserFactory=org/apache/xerces/jaxp/SAXParserFactoryImpl


0
 
LVL 7

Expert Comment

by:yoren
ID: 7061359
Did you ever hear the joke about the guy who walks into a doctor's office and said "it hurts when I raise my arm like this?" Well, like the doctor told him,

Don't do that.
0
 
LVL 7

Accepted Solution

by:
yoren earned 1200 total points
ID: 7061360
P.S. After we get this working for you, please reject the pending answer and accept one of my comments as the answer. Thanks!
0
 
LVL 35

Expert Comment

by:girionis
ID: 7061382
>Did you ever hear the joke about the guy who walks into a doctor's office and said "it hurts when I
>raise my arm like this?" Well, like the doctor told him,
>
>Don't do that.

  LOL. :-)

  mikechen the java.lang.NoClassDefFoundError means that the VM cannot find the class you want. Make sure that the path to this class (or the jar file) is in the classpath. If you have put the jar file inside the /ext folder then the VM should pick it up automatically. If you still have the NoClassDefFoundError then make sure that the class you are looking for is in the correct jar file.

  Sometimes you even have to restart your server.

  Hope it helps :-)
0
 
LVL 7

Expert Comment

by:yoren
ID: 7061419
The NoClassDefFoundError is occurring because of the space after the -D flag. It thinks you're trying to run a program called "javax.xml.parsers.SAXParserFactory=org.apache.xerces.jaxp.SAXParserFactoryImpl " instead of setting the system property. Remove the space to fix the problem.
0
 
LVL 35

Expert Comment

by:girionis
ID: 7061447
 Aha.. I see. To be honest I was never in the need to set a new system property dynamically. Thanks for the information.
0
 

Author Comment

by:mikechen
ID: 7065463
OK, I am back from some chaos.

Hi, Yoren,
How do I fix
"File is not valid: The encoding "ISO8859-1" is not supported." ?

0
 
LVL 35

Expert Comment

by:girionis
ID: 7065506
 Change this: "ISO8859-1" to this: "ISO-8859-1"

  Hope it helps.
0
 

Author Comment

by:mikechen
ID: 7075955
Thanks for your help, MikaelHK.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question