Solved

Problem converting DTD to XML schema

Posted on 2004-10-29
634 Views
Last Modified: 2013-11-19
Hi gurus,

I have a hard time converting the following DTD definitions to XML schema. First, the DTD(only the portion I have problems with):

<!ELEMENT OtherServer (URL | MSISDN | (URL, MSISDN))>
<!ELEMENT URL (#PCDATA)>
<!ELEMENT MSISDN (#PCDATA)>

Below is what XMLSpy or XMLWriter gives me when I use them to convert to XML schema(only the portion I have problems with):

        <xsd:element name="OtherServer">
               <xsd:complexType>
                       <xsd:choice>
                               <xsd:element ref="URL"/>
                               <xsd:element ref="MSISDN"/>
                               <xsd:sequence>
                                       <xsd:element ref="URL"/>
                                       <xsd:element ref="MSISDN"/>
                               </xsd:sequence>
                       </xsd:choice>
               </xsd:complexType>
       </xsd:element>
      <xsd:element name="MSISDN" type="xsd:string"/>
      <xsd:element name="URL" type="xsd:string"/>

This sounds simple enough. However, in actual usage, this fails. My test code is like the following:

SAXBuilder xmlParser = new SAXBuilder("org.apache.xerces.parsers.SAXParser", true);
xmlParser.setFeature("http://apache.org/xml/features/validation/schema", true);
xmlParser.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", "WV-VERDISC.xsd");
// xmlStr holds the XML data
StringReader rdr = new StringReader(xmlStr);
Document xmlDoc = xmlParser.build(rdr);

Sample XML(only the portion I have problems with):

   <OtherServer>
      <URL>http://www.blah.com/XMLTest</URL>
      <MSISDN>+1234567890</MSISDN>
   </OtherServer>
   <OtherServer>
      <URL>http://www.blah.com/XMLTest</URL>
   </OtherServer>
   <OtherServer>
      <MSISDN>+1234567890</MSISDN>
   </OtherServer>

So, looking at the DTD, the above 3 OtherServer XML elements are all valid. But using the converted XML schema, during validation, if OtherServer has both URL and MSISDN child elements, the parser says it is NOT valid. After many trials and errors, I am wondering if the DTD definition can be converted to XML schema at all. I tried group declarations and it won't work. I am using  JDOM 1.0 and Xerces 2.6.2. Any clues is greatly appreciated! TIA!
0
Question by:Andyfungkl
    8 Comments
     
    LVL 3

    Expert Comment

    by:DitmarBehn
    Hi,

    it seems as if the xsd:choice is not the correct way of handling such a situation. Choice is really an either/xor without the possibility of having multiple/all child nodes available.

    Try it with the following xsd snippet and see if it fits your need:

    <xs:sequence>
      <xs:element name="OtherServer" maxOccurs="unbounded">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="URL" type="xs:anyURI" minOccurs="0"/>
            <xs:element name="MSISDN" type="xs:int" minOccurs="0"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>

    Regards, Ditmar
    0
     
    LVL 26

    Expert Comment

    by:rdcpro
    Hmmm...I don't see anything at all wrong with the OP's schema.  It validated fine for me, but I'm not using JDOM or Xerces.  Essentially is says a choice of either A, or B or both.    

    The problem with using an xs:sequence is that it allows this:

    <OtherServer>
    </OtherServer>

    Which is not allowed with the DTD.  If an empty OtherServer is allowed, then this works.

    I'm wondering, too, if there is a need to constrain the order of the URL and MSISDN elements.  If you don't want to constrain the order of the child elements of OtherServer, then use the xs:all declaration:

      <xs:element name="OtherServer" maxOccurs="unbounded">
        <xs:complexType>
          <xs:all>
            <xs:element name="URL" type="xs:anyURI" minOccurs="0"/>
            <xs:element name="MSISDN" type="xs:int" minOccurs="0"/>
          </xs:sequence>
        </xs:all>
      </xs:element>

    But again, this allows an empty OtherServer element.

    Regards,
    Mike Sharp
    0
     

    Author Comment

    by:Andyfungkl
    Hi Mike,

    Which XML parser are you using just curious? I am subjecting it is a problem with JDOM + Xerces too, since I don't see anything wrong as well.

    Regards,
     Andy
    0
     
    LVL 26

    Expert Comment

    by:rdcpro
    I did the validation using XML Spy, Ver 5, Release 4.  While I have found obscure validation bugs in XML Spy, this seems like a pretty common content model to me.  In fact, it's weird that JDOM and Xerces would have problems with it.  I wonder if something else might be the problem...

    Regards,
    Mike Sharp
    0
     
    LVL 60

    Accepted Solution

    by:
    Hi there,

    just to add my two cents.

    The problem is that basically both the schema and the DTD are unvalid.
    Old SGML parsers that are turned into XML parsers would spot that the DTD content model is ambiguous.
    To explain this in a simple fashion: if you open an <URL> element, the parser doesn't know if the branch <URL> or the branch <URL><MSISDN> opens without looking ahead.
    If you then build your parser as a finite state machine, it will require lookaheads... which you don't want, it is slow...

    The problem with DTDs is that parsers are not obliged to report this... most modern DTD validating parsers DON'T.

    If you try to rewrite your DTD as such
    <!ELEMENT OtherServer ( MSISDN | (URL, MSISDN?))>
    It is no longer ambiguous and it is equivalent to what you mean.

    The transformation of the schema as originally posted is equivalent to the also posted DTD snippet.
    on <xs:element> the MinOccurs attribute is defaulted "1".
    So in my mind the story about <OtherServer></OtherServer> being allowed is not correct
    (please note that XML Spy doesn't always do the right thing with schema validation, Xerxes usually does)

    If I parse the original snippet, my parser says (correctly) that it breaks the Unique Particle Attribution principle
    (http://www.w3.org/TR/xmlschema-1/#cos-nonambig)
    Schema validating parsers HAVE to report this. This is why you never found the problem with the DTD, only with the schema

    My non-ambiguous DTD alternative translates in the following schema

     <xs:element name="OtherServer">
        <xs:complexType>
          <xs:choice>
            <xs:element ref="MSISDN"/>
            <xs:sequence>
              <xs:element ref="URL"/>
              <xs:element minOccurs="0" ref="MSISDN"/>
            </xs:sequence>
          </xs:choice>
        </xs:complexType>
      </xs:element>
      <xs:element name="URL" type="xs:string"/>
      <xs:element name="MSISDN" type="xs:string"/>

    This schema works with your examples.
    I guess this closes the subject :-)
    Have a nice evening

    Gertone
    0
     
    LVL 60

    Expert Comment

    by:Geert Bormans
    well,
    If it helps you to make a recommendation...
    I am convinced that the question was answered correctly and completely :-)
    I don't care about the points, but I would hate the solution being removed from the archive.

    To quote my parser on the DTD:
    Markup Error (0004) on line 2 in file Markup Stream:
    A content model must not be ambiguous.
    For the declared element "OtherServer", the element "URL" is ambiguous
    in the content model.

    Gertone
    0
     
    LVL 26

    Expert Comment

    by:rdcpro
    I agree.  Points to Gertone.

    Regards,
    Mike Sharp
    0
     

    Author Comment

    by:Andyfungkl
    Sorry for not checking experts-exchange for awhile, anyway Gertone really nails the problem. Thanks Gertone and rdcpro both of you!
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    Most of the sites are being standardized with W3C Web Standards. W3C provides lot of web standard services to the web. They have the web specification, process and documentation for all the web standards. You can apply HTML, CSS and Accessibility st…
    The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
    The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
    The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

    884 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    21 Experts available now in Live!

    Get 1:1 Help Now