Problem converting DTD to XML schema

Hi gurus,

I have a hard time converting the following DTD definitions to XML schema. First, the DTD(only the portion I have problems with):

<!ELEMENT OtherServer (URL | MSISDN | (URL, MSISDN))>
<!ELEMENT URL (#PCDATA)>
<!ELEMENT MSISDN (#PCDATA)>

Below is what XMLSpy or XMLWriter gives me when I use them to convert to XML schema(only the portion I have problems with):

        <xsd:element name="OtherServer">
               <xsd:complexType>
                       <xsd:choice>
                               <xsd:element ref="URL"/>
                               <xsd:element ref="MSISDN"/>
                               <xsd:sequence>
                                       <xsd:element ref="URL"/>
                                       <xsd:element ref="MSISDN"/>
                               </xsd:sequence>
                       </xsd:choice>
               </xsd:complexType>
       </xsd:element>
      <xsd:element name="MSISDN" type="xsd:string"/>
      <xsd:element name="URL" type="xsd:string"/>

This sounds simple enough. However, in actual usage, this fails. My test code is like the following:

SAXBuilder xmlParser = new SAXBuilder("org.apache.xerces.parsers.SAXParser", true);
xmlParser.setFeature("http://apache.org/xml/features/validation/schema", true);
xmlParser.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", "WV-VERDISC.xsd");
// xmlStr holds the XML data
StringReader rdr = new StringReader(xmlStr);
Document xmlDoc = xmlParser.build(rdr);

Sample XML(only the portion I have problems with):

   <OtherServer>
      <URL>http://www.blah.com/XMLTest</URL>
      <MSISDN>+1234567890</MSISDN>
   </OtherServer>
   <OtherServer>
      <URL>http://www.blah.com/XMLTest</URL>
   </OtherServer>
   <OtherServer>
      <MSISDN>+1234567890</MSISDN>
   </OtherServer>

So, looking at the DTD, the above 3 OtherServer XML elements are all valid. But using the converted XML schema, during validation, if OtherServer has both URL and MSISDN child elements, the parser says it is NOT valid. After many trials and errors, I am wondering if the DTD definition can be converted to XML schema at all. I tried group declarations and it won't work. I am using  JDOM 1.0 and Xerces 2.6.2. Any clues is greatly appreciated! TIA!
AndyfungklAsked:
Who is Participating?
 
Geert BormansConnect With a Mentor Information ArchitectCommented:
Hi there,

just to add my two cents.

The problem is that basically both the schema and the DTD are unvalid.
Old SGML parsers that are turned into XML parsers would spot that the DTD content model is ambiguous.
To explain this in a simple fashion: if you open an <URL> element, the parser doesn't know if the branch <URL> or the branch <URL><MSISDN> opens without looking ahead.
If you then build your parser as a finite state machine, it will require lookaheads... which you don't want, it is slow...

The problem with DTDs is that parsers are not obliged to report this... most modern DTD validating parsers DON'T.

If you try to rewrite your DTD as such
<!ELEMENT OtherServer ( MSISDN | (URL, MSISDN?))>
It is no longer ambiguous and it is equivalent to what you mean.

The transformation of the schema as originally posted is equivalent to the also posted DTD snippet.
on <xs:element> the MinOccurs attribute is defaulted "1".
So in my mind the story about <OtherServer></OtherServer> being allowed is not correct
(please note that XML Spy doesn't always do the right thing with schema validation, Xerxes usually does)

If I parse the original snippet, my parser says (correctly) that it breaks the Unique Particle Attribution principle
(http://www.w3.org/TR/xmlschema-1/#cos-nonambig)
Schema validating parsers HAVE to report this. This is why you never found the problem with the DTD, only with the schema

My non-ambiguous DTD alternative translates in the following schema

 <xs:element name="OtherServer">
    <xs:complexType>
      <xs:choice>
        <xs:element ref="MSISDN"/>
        <xs:sequence>
          <xs:element ref="URL"/>
          <xs:element minOccurs="0" ref="MSISDN"/>
        </xs:sequence>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="URL" type="xs:string"/>
  <xs:element name="MSISDN" type="xs:string"/>

This schema works with your examples.
I guess this closes the subject :-)
Have a nice evening

Gertone
0
 
DitmarBehnCommented:
Hi,

it seems as if the xsd:choice is not the correct way of handling such a situation. Choice is really an either/xor without the possibility of having multiple/all child nodes available.

Try it with the following xsd snippet and see if it fits your need:

<xs:sequence>
  <xs:element name="OtherServer" maxOccurs="unbounded">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="URL" type="xs:anyURI" minOccurs="0"/>
        <xs:element name="MSISDN" type="xs:int" minOccurs="0"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:sequence>

Regards, Ditmar
0
 
rdcproCommented:
Hmmm...I don't see anything at all wrong with the OP's schema.  It validated fine for me, but I'm not using JDOM or Xerces.  Essentially is says a choice of either A, or B or both.    

The problem with using an xs:sequence is that it allows this:

<OtherServer>
</OtherServer>

Which is not allowed with the DTD.  If an empty OtherServer is allowed, then this works.

I'm wondering, too, if there is a need to constrain the order of the URL and MSISDN elements.  If you don't want to constrain the order of the child elements of OtherServer, then use the xs:all declaration:

  <xs:element name="OtherServer" maxOccurs="unbounded">
    <xs:complexType>
      <xs:all>
        <xs:element name="URL" type="xs:anyURI" minOccurs="0"/>
        <xs:element name="MSISDN" type="xs:int" minOccurs="0"/>
      </xs:sequence>
    </xs:all>
  </xs:element>

But again, this allows an empty OtherServer element.

Regards,
Mike Sharp
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
AndyfungklAuthor Commented:
Hi Mike,

Which XML parser are you using just curious? I am subjecting it is a problem with JDOM + Xerces too, since I don't see anything wrong as well.

Regards,
 Andy
0
 
rdcproCommented:
I did the validation using XML Spy, Ver 5, Release 4.  While I have found obscure validation bugs in XML Spy, this seems like a pretty common content model to me.  In fact, it's weird that JDOM and Xerces would have problems with it.  I wonder if something else might be the problem...

Regards,
Mike Sharp
0
 
Geert BormansInformation ArchitectCommented:
well,
If it helps you to make a recommendation...
I am convinced that the question was answered correctly and completely :-)
I don't care about the points, but I would hate the solution being removed from the archive.

To quote my parser on the DTD:
Markup Error (0004) on line 2 in file Markup Stream:
A content model must not be ambiguous.
For the declared element "OtherServer", the element "URL" is ambiguous
in the content model.

Gertone
0
 
rdcproCommented:
I agree.  Points to Gertone.

Regards,
Mike Sharp
0
 
AndyfungklAuthor Commented:
Sorry for not checking experts-exchange for awhile, anyway Gertone really nails the problem. Thanks Gertone and rdcpro both of you!
0
All Courses

From novice to tech pro — start learning today.