Solved

Xerces C++ API - Parser problems

Posted on 2003-11-04
8
736 Views
Last Modified: 2013-11-19

Hello everybody,

I have a problem with the parser and I am not sure whether it is a bug or I am doing something wrong.

The XML Schema defines a field like this:

<xsd:element name="some-field" type="xsd:positiveInteger" />

Inside the XML file that I need to parse, there are cases when this field is present or is absent.
When the field is completely absent, the parser raises an error. However, when I have a tag like this
<image-number/> the parser doesn't raise the error. It parses the message as if everything is fine. This causes me problems because according to the schema, I expect there a value.

Now, in the schema (as it can be seen) there is no provision for the field to be nillable.

For that, the element should have been defined like this:
<xsd:element name="some-field" type="xsd:positiveInteger" nillable="true"/>

Is this a bug or that is the way the parser is suppose to behave? I would have expected the parser to catch the case when the value of the field is not present.

Any help would be very much appreciated.
TIA
0
Comment
Question by:Mensana
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 3

Accepted Solution

by:
savalou earned 250 total points
ID: 9691239
The schema says that you need to have a "some-field" element.  If you could omit it, it would have a minOccurs="0" attribute.  

So when you have the element, even if it's content is empty, the parser is happy.  

If you have a nillable="true" attribute in the schema definition, then you can explicitly set a nill flag to specify that you have a null value.  Look here for more info on that:
http://www.w3.org/TR/xmlschema-0/#Nils

So I don't think the parser is misbehaving, sorry to say.
0
 
LVL 1

Author Comment

by:Mensana
ID: 9693781

Hi,

I understand what you're saying and I'll give you the points for your effort to reply me. What I don't understand is the meaning of declaring an element as "nillable". Why would even bother saying that "some-field" element is nillable when you always can put it like this:

<some-field/>

instead of

<some-field>its-value</some-field>

and you'll achieve the same thing without having the parser complain about "nill-ness" that doesn't verify the XML Schema.

Thanks again,
Eddie
0
 
LVL 3

Expert Comment

by:savalou
ID: 9693901
Sometimes there's a difference between a blank and a null value.  A null could mean that the value was never assigned but a blank could mean that is the value.  I wish I had a good example for you but I can't think of anything meaningful.  But I've encountered the situation and in an IT world it makes sense to be able to distinguish between the two.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:Mensana
ID: 9694099

What you say makes sense. My problem comes from the fact that now I have to check for every element whether it has a value or not. Before I relied on the parser to find errors in the XML message, but it appears that this is not reliable at all.
To understand what I am saying, here is a short example:

// before I had it like this

// get the document
DOMDocument *pXMLDomDocument = pDOMParser->getDocument();

// get the root of the document
DOMElement *pRoot = pXMLDomDocument->getDocumentElement();

// ... navigate to the node I wanted to read
DOMNode *pNode = MyFunction2FindANode( pRoot, "Label_of_Node" );

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = pTextNode->getData();

I would only check to see whether the pTextNode pointer is NULL for those elements that were "nillable":

// ... same as above

// get its value
DOMText *pTextNode = MyFunction2FindAValue( pNode );

// extract the value and do something with it
CString strElementValue = "";
if( pTextNode )
   strElementValue = pTextNode->getData();

Now I need to do it everywhere because the parser will not trigger an exception. Since I would have to do this check in a lot of places (which will considerably slow down the process), I thought that there is another solution.

Anyway, here are the points for you. Thanks again for taking the time to answer my question.
Eddie
0
 
LVL 3

Expert Comment

by:savalou
ID: 9695225
I take it back.  I wrote without testing.  The parser should make a noise.  Are you sure you've got validation on?  I know more about the Java parser than C++, but I did a test using the SAXPrint executable that ships with Xerces C++ v2.3 and when I have an element with empty content but which is declared
   <xsd:element name="zip"    type="xsd:positiveInteger"/>
in the schema, it says:

  Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value ''
does not match regular expression facet '[+\-]?[0-9]+'.

Maybe you can run your instance document through this and see what happens?
0
 
LVL 1

Author Comment

by:Mensana
ID: 9695336
OK, your message is a "heads-up" one for me. Here is the function that creates the parser:

XercesDOMParser *CreateValidatingParser( const XMLCh *schema )
{
    XercesDOMParser *parser = new XercesDOMParser;
    parser->setValidationScheme( XercesDOMParser::Val_Always );
    parser->setDoNamespaces( true );
    parser->setDoSchema( true );
    parser->setValidationSchemaFullChecking( false );
    parser->setExternalNoNamespaceSchemaLocation( schema );
    return parser;
}

The only thing that could affect the behavior would be to set the full checking:

parser->setValidationSchemaFullChecking( true );

In the documentation (http://xml.apache.org/xerces-c/apiDocs/classAbstractDOMParser.html#z491_7) is said that:

This method allows the user to turn full Schema constraint checking on/off.

Only takes effect if Schema validation is enabled. If turned off, partial constraint checking is done.

Full schema constraint checking includes those checking that may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation resriction checking are controlled by this option.

The parser's default state is: false.

Do you suppose this would help? I am going to try it.
Thanks again.
0
 
LVL 1

Author Comment

by:Mensana
ID: 9702550
Hey, I tried to set the Full Checking flag and it didn't help. The parser went ahead and processed the whole file wothout errors. I am still searching...
0
 

Expert Comment

by:alex_kamenev
ID: 9962578
Hi
In C++ you should use the custom ErrorHandler class (see DOMCount sample),
or if you are not interested in what the error is, you can simply use the
getErrorCount() method of the XercesDOMParser class (in case of errors
it should return value greater than 0)
Hope this helps.
Sorry for my bEd Anglish :)
0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Re-position the objects 7 125
XML XSL Choose example 3 38
JSON  parse help 8 52
How to configure empty element in XML Document parser? 15 45
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
I found this questions asking how to do this in many different forums, so I will describe here how to implement a solution using PHP and AJAX. The logical flow for the problem should be: Write an event handler for the first drop down box to get …
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question