Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Xerces C++ API - Parser problems

Posted on 2003-11-04
Medium Priority
Last Modified: 2013-11-19

Hello everybody,

I have a problem with the parser and I am not sure whether it is a bug or I am doing something wrong.

The XML Schema defines a field like this:

<xsd:element name="some-field" type="xsd:positiveInteger" />

Inside the XML file that I need to parse, there are cases when this field is present or is absent.
When the field is completely absent, the parser raises an error. However, when I have a tag like this
<image-number/> the parser doesn't raise the error. It parses the message as if everything is fine. This causes me problems because according to the schema, I expect there a value.

Now, in the schema (as it can be seen) there is no provision for the field to be nillable.

For that, the element should have been defined like this:
<xsd:element name="some-field" type="xsd:positiveInteger" nillable="true"/>

Is this a bug or that is the way the parser is suppose to behave? I would have expected the parser to catch the case when the value of the field is not present.

Any help would be very much appreciated.
Question by:Mensana
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3

Accepted Solution

savalou earned 1000 total points
ID: 9691239
The schema says that you need to have a "some-field" element.  If you could omit it, it would have a minOccurs="0" attribute.  

So when you have the element, even if it's content is empty, the parser is happy.  

If you have a nillable="true" attribute in the schema definition, then you can explicitly set a nill flag to specify that you have a null value.  Look here for more info on that:

So I don't think the parser is misbehaving, sorry to say.

Author Comment

ID: 9693781


I understand what you're saying and I'll give you the points for your effort to reply me. What I don't understand is the meaning of declaring an element as "nillable". Why would even bother saying that "some-field" element is nillable when you always can put it like this:


instead of


and you'll achieve the same thing without having the parser complain about "nill-ness" that doesn't verify the XML Schema.

Thanks again,

Expert Comment

ID: 9693901
Sometimes there's a difference between a blank and a null value.  A null could mean that the value was never assigned but a blank could mean that is the value.  I wish I had a good example for you but I can't think of anything meaningful.  But I've encountered the situation and in an IT world it makes sense to be able to distinguish between the two.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 9694099

What you say makes sense. My problem comes from the fact that now I have to check for every element whether it has a value or not. Before I relied on the parser to find errors in the XML message, but it appears that this is not reliable at all.
To understand what I am saying, here is a short example:

// before I had it like this

// get the document
DOMDocument *pXMLDomDocument = pDOMParser->getDocument();

// get the root of the document
DOMElement *pRoot = pXMLDomDocument->getDocumentElement();

// ... navigate to the node I wanted to read
DOMNode *pNode = MyFunction2FindANode( pRoot, "Label_of_Node" );

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = pTextNode->getData();

I would only check to see whether the pTextNode pointer is NULL for those elements that were "nillable":

// ... same as above

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = "";
if( pTextNode )
   strElementValue = pTextNode->getData();

Now I need to do it everywhere because the parser will not trigger an exception. Since I would have to do this check in a lot of places (which will considerably slow down the process), I thought that there is another solution.

Anyway, here are the points for you. Thanks again for taking the time to answer my question.

Expert Comment

ID: 9695225
I take it back.  I wrote without testing.  The parser should make a noise.  Are you sure you've got validation on?  I know more about the Java parser than C++, but I did a test using the SAXPrint executable that ships with Xerces C++ v2.3 and when I have an element with empty content but which is declared
   <xsd:element name="zip"    type="xsd:positiveInteger"/>
in the schema, it says:

  Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value ''
does not match regular expression facet '[+\-]?[0-9]+'.

Maybe you can run your instance document through this and see what happens?

Author Comment

ID: 9695336
OK, your message is a "heads-up" one for me. Here is the function that creates the parser:

XercesDOMParser *CreateValidatingParser( const XMLCh *schema )
    XercesDOMParser *parser = new XercesDOMParser;
    parser->setValidationScheme( XercesDOMParser::Val_Always );
    parser->setDoNamespaces( true );
    parser->setDoSchema( true );
    parser->setValidationSchemaFullChecking( false );
    parser->setExternalNoNamespaceSchemaLocation( schema );
    return parser;

The only thing that could affect the behavior would be to set the full checking:

parser->setValidationSchemaFullChecking( true );

In the documentation (http://xml.apache.org/xerces-c/apiDocs/classAbstractDOMParser.html#z491_7) is said that:

This method allows the user to turn full Schema constraint checking on/off.

Only takes effect if Schema validation is enabled. If turned off, partial constraint checking is done.

Full schema constraint checking includes those checking that may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation resriction checking are controlled by this option.

The parser's default state is: false.

Do you suppose this would help? I am going to try it.
Thanks again.

Author Comment

ID: 9702550
Hey, I tried to set the Full Checking flag and it didn't help. The parser went ahead and processed the whole file wothout errors. I am still searching...

Expert Comment

ID: 9962578
In C++ you should use the custom ErrorHandler class (see DOMCount sample),
or if you are not interested in what the error is, you can simply use the
getErrorCount() method of the XercesDOMParser class (in case of errors
it should return value greater than 0)
Hope this helps.
Sorry for my bEd Anglish :)

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question