Solved

Xerces C++ API - Parser problems

Posted on 2003-11-04
8
737 Views
Last Modified: 2013-11-19

Hello everybody,

I have a problem with the parser and I am not sure whether it is a bug or I am doing something wrong.

The XML Schema defines a field like this:

<xsd:element name="some-field" type="xsd:positiveInteger" />

Inside the XML file that I need to parse, there are cases when this field is present or is absent.
When the field is completely absent, the parser raises an error. However, when I have a tag like this
<image-number/> the parser doesn't raise the error. It parses the message as if everything is fine. This causes me problems because according to the schema, I expect there a value.

Now, in the schema (as it can be seen) there is no provision for the field to be nillable.

For that, the element should have been defined like this:
<xsd:element name="some-field" type="xsd:positiveInteger" nillable="true"/>

Is this a bug or that is the way the parser is suppose to behave? I would have expected the parser to catch the case when the value of the field is not present.

Any help would be very much appreciated.
TIA
0
Comment
Question by:Mensana
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 3

Accepted Solution

by:
savalou earned 250 total points
ID: 9691239
The schema says that you need to have a "some-field" element.  If you could omit it, it would have a minOccurs="0" attribute.  

So when you have the element, even if it's content is empty, the parser is happy.  

If you have a nillable="true" attribute in the schema definition, then you can explicitly set a nill flag to specify that you have a null value.  Look here for more info on that:
http://www.w3.org/TR/xmlschema-0/#Nils

So I don't think the parser is misbehaving, sorry to say.
0
 
LVL 1

Author Comment

by:Mensana
ID: 9693781

Hi,

I understand what you're saying and I'll give you the points for your effort to reply me. What I don't understand is the meaning of declaring an element as "nillable". Why would even bother saying that "some-field" element is nillable when you always can put it like this:

<some-field/>

instead of

<some-field>its-value</some-field>

and you'll achieve the same thing without having the parser complain about "nill-ness" that doesn't verify the XML Schema.

Thanks again,
Eddie
0
 
LVL 3

Expert Comment

by:savalou
ID: 9693901
Sometimes there's a difference between a blank and a null value.  A null could mean that the value was never assigned but a blank could mean that is the value.  I wish I had a good example for you but I can't think of anything meaningful.  But I've encountered the situation and in an IT world it makes sense to be able to distinguish between the two.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:Mensana
ID: 9694099

What you say makes sense. My problem comes from the fact that now I have to check for every element whether it has a value or not. Before I relied on the parser to find errors in the XML message, but it appears that this is not reliable at all.
To understand what I am saying, here is a short example:

// before I had it like this

// get the document
DOMDocument *pXMLDomDocument = pDOMParser->getDocument();

// get the root of the document
DOMElement *pRoot = pXMLDomDocument->getDocumentElement();

// ... navigate to the node I wanted to read
DOMNode *pNode = MyFunction2FindANode( pRoot, "Label_of_Node" );

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = pTextNode->getData();

I would only check to see whether the pTextNode pointer is NULL for those elements that were "nillable":

// ... same as above

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = "";
if( pTextNode )
   strElementValue = pTextNode->getData();

Now I need to do it everywhere because the parser will not trigger an exception. Since I would have to do this check in a lot of places (which will considerably slow down the process), I thought that there is another solution.

Anyway, here are the points for you. Thanks again for taking the time to answer my question.
Eddie
0
 
LVL 3

Expert Comment

by:savalou
ID: 9695225
I take it back.  I wrote without testing.  The parser should make a noise.  Are you sure you've got validation on?  I know more about the Java parser than C++, but I did a test using the SAXPrint executable that ships with Xerces C++ v2.3 and when I have an element with empty content but which is declared
   <xsd:element name="zip"    type="xsd:positiveInteger"/>
in the schema, it says:

  Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value ''
does not match regular expression facet '[+\-]?[0-9]+'.

Maybe you can run your instance document through this and see what happens?
0
 
LVL 1

Author Comment

by:Mensana
ID: 9695336
OK, your message is a "heads-up" one for me. Here is the function that creates the parser:

XercesDOMParser *CreateValidatingParser( const XMLCh *schema )
{
    XercesDOMParser *parser = new XercesDOMParser;
    parser->setValidationScheme( XercesDOMParser::Val_Always );
    parser->setDoNamespaces( true );
    parser->setDoSchema( true );
    parser->setValidationSchemaFullChecking( false );
    parser->setExternalNoNamespaceSchemaLocation( schema );
    return parser;
}

The only thing that could affect the behavior would be to set the full checking:

parser->setValidationSchemaFullChecking( true );

In the documentation (http://xml.apache.org/xerces-c/apiDocs/classAbstractDOMParser.html#z491_7) is said that:

This method allows the user to turn full Schema constraint checking on/off.

Only takes effect if Schema validation is enabled. If turned off, partial constraint checking is done.

Full schema constraint checking includes those checking that may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation resriction checking are controlled by this option.

The parser's default state is: false.

Do you suppose this would help? I am going to try it.
Thanks again.
0
 
LVL 1

Author Comment

by:Mensana
ID: 9702550
Hey, I tried to set the Full Checking flag and it didn't help. The parser went ahead and processed the whole file wothout errors. I am still searching...
0
 

Expert Comment

by:alex_kamenev
ID: 9962578
Hi
In C++ you should use the custom ErrorHandler class (see DOMCount sample),
or if you are not interested in what the error is, you can simply use the
getErrorCount() method of the XercesDOMParser class (in case of errors
it should return value greater than 0)
Hope this helps.
Sorry for my bEd Anglish :)
0

Featured Post

What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface This article introduces an authentication and authorization system for a website.  It is understood by the author and the project contributors that there is no such thing as a "one size fits all" system.  That being said, there is a certa…
The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question