Xerces C++ API - Parser problems


Hello everybody,

I have a problem with the parser and I am not sure whether it is a bug or I am doing something wrong.

The XML Schema defines a field like this:

<xsd:element name="some-field" type="xsd:positiveInteger" />

Inside the XML file that I need to parse, there are cases when this field is present or is absent.
When the field is completely absent, the parser raises an error. However, when I have a tag like this
<image-number/> the parser doesn't raise the error. It parses the message as if everything is fine. This causes me problems because according to the schema, I expect there a value.

Now, in the schema (as it can be seen) there is no provision for the field to be nillable.

For that, the element should have been defined like this:
<xsd:element name="some-field" type="xsd:positiveInteger" nillable="true"/>

Is this a bug or that is the way the parser is suppose to behave? I would have expected the parser to catch the case when the value of the field is not present.

Any help would be very much appreciated.
TIA
LVL 1
MensanaAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

savalouCommented:
The schema says that you need to have a "some-field" element.  If you could omit it, it would have a minOccurs="0" attribute.  

So when you have the element, even if it's content is empty, the parser is happy.  

If you have a nillable="true" attribute in the schema definition, then you can explicitly set a nill flag to specify that you have a null value.  Look here for more info on that:
http://www.w3.org/TR/xmlschema-0/#Nils

So I don't think the parser is misbehaving, sorry to say.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MensanaAuthor Commented:

Hi,

I understand what you're saying and I'll give you the points for your effort to reply me. What I don't understand is the meaning of declaring an element as "nillable". Why would even bother saying that "some-field" element is nillable when you always can put it like this:

<some-field/>

instead of

<some-field>its-value</some-field>

and you'll achieve the same thing without having the parser complain about "nill-ness" that doesn't verify the XML Schema.

Thanks again,
Eddie
0
savalouCommented:
Sometimes there's a difference between a blank and a null value.  A null could mean that the value was never assigned but a blank could mean that is the value.  I wish I had a good example for you but I can't think of anything meaningful.  But I've encountered the situation and in an IT world it makes sense to be able to distinguish between the two.
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

MensanaAuthor Commented:

What you say makes sense. My problem comes from the fact that now I have to check for every element whether it has a value or not. Before I relied on the parser to find errors in the XML message, but it appears that this is not reliable at all.
To understand what I am saying, here is a short example:

// before I had it like this

// get the document
DOMDocument *pXMLDomDocument = pDOMParser->getDocument();

// get the root of the document
DOMElement *pRoot = pXMLDomDocument->getDocumentElement();

// ... navigate to the node I wanted to read
DOMNode *pNode = MyFunction2FindANode( pRoot, "Label_of_Node" );

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = pTextNode->getData();

I would only check to see whether the pTextNode pointer is NULL for those elements that were "nillable":

// ... same as above

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = "";
if( pTextNode )
   strElementValue = pTextNode->getData();

Now I need to do it everywhere because the parser will not trigger an exception. Since I would have to do this check in a lot of places (which will considerably slow down the process), I thought that there is another solution.

Anyway, here are the points for you. Thanks again for taking the time to answer my question.
Eddie
0
savalouCommented:
I take it back.  I wrote without testing.  The parser should make a noise.  Are you sure you've got validation on?  I know more about the Java parser than C++, but I did a test using the SAXPrint executable that ships with Xerces C++ v2.3 and when I have an element with empty content but which is declared
   <xsd:element name="zip"    type="xsd:positiveInteger"/>
in the schema, it says:

  Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value ''
does not match regular expression facet '[+\-]?[0-9]+'.

Maybe you can run your instance document through this and see what happens?
0
MensanaAuthor Commented:
OK, your message is a "heads-up" one for me. Here is the function that creates the parser:

XercesDOMParser *CreateValidatingParser( const XMLCh *schema )
{
    XercesDOMParser *parser = new XercesDOMParser;
    parser->setValidationScheme( XercesDOMParser::Val_Always );
    parser->setDoNamespaces( true );
    parser->setDoSchema( true );
    parser->setValidationSchemaFullChecking( false );
    parser->setExternalNoNamespaceSchemaLocation( schema );
    return parser;
}

The only thing that could affect the behavior would be to set the full checking:

parser->setValidationSchemaFullChecking( true );

In the documentation (http://xml.apache.org/xerces-c/apiDocs/classAbstractDOMParser.html#z491_7) is said that:

This method allows the user to turn full Schema constraint checking on/off.

Only takes effect if Schema validation is enabled. If turned off, partial constraint checking is done.

Full schema constraint checking includes those checking that may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation resriction checking are controlled by this option.

The parser's default state is: false.

Do you suppose this would help? I am going to try it.
Thanks again.
0
MensanaAuthor Commented:
Hey, I tried to set the Full Checking flag and it didn't help. The parser went ahead and processed the whole file wothout errors. I am still searching...
0
alex_kamenevCommented:
Hi
In C++ you should use the custom ErrorHandler class (see DOMCount sample),
or if you are not interested in what the error is, you can simply use the
getErrorCount() method of the XercesDOMParser class (in case of errors
it should return value greater than 0)
Hope this helps.
Sorry for my bEd Anglish :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.