Solved

Inconsistent XML Validation Error in SAXParser

Posted on 2004-10-01
10
440 Views
Last Modified: 2013-11-19
We process several thousand xml documents a day with a webserver that recieves xml documents, parses them and updates a database.   A small percentage of them, (est 1/5,000) fail xml validation, but when resubmitted exactly the same, they pass.  

I've got some data from the logs.  I think the error is coming from the apache SAXParser.  It appears that when the failure comes, it expects a tag that is not defined as required in the schema, but it appears that the parser thinks it is required.  (The schema defines more fields than we are currently receiving, but most are minOccurs=0)

If I have 2 different failures, I can look in a log and see that the xml was in the same format for both failures, but the failure is on a different field.  

This is an excerpt of an error & matching xml:
cvc-complex-type.2.4.a: Invalid content starting with element 'TerminationDate'. The content must match '(("":EmployeeNumber),("":LastName){0-1},("":FirstName){0-1},("":Department){0-1},("":WelderSymbol){0-1},("":EmployeeRate),("":TerminationDate){0-1},("":TelephoneNumber),("":Sex){0-1},("":PhoneExtension){0-1},("":WorkTelephone){0-1},("":TimeReportGroup){0-1},("":EmpExemptInd){0-1},("":PayCycleType){0-1},
...
<EmployeeNumber>203777</EmployeeNumber>
<LastName>SMITH</LastName>
<FirstName>AL</FirstName>
<Department/>
<TerminationDate>        </TerminationDate>
<TelephoneNumber/>
<Sex>M</Sex>
<PhoneExtension/>
<WorkTelephone/>
<PayCycleType>WKLY</PayCycleType>
...
---------------
This is an excerpt of another error & matching xml:
cvc-complex-type.2.4.a: Invalid content starting with element 'PayCycleType'. The content must match ("":EmployeeNumber),("":LastName){0-1},("":FirstName){0-1},("":Department){0-1},("":WelderSymbol){0-1},("":EmployeeRate){0-1},("":TerminationDate){0-1},("":TelephoneNumber){0-1},("":Sex){0-1},("":PhoneExtension){0-1},("":WorkTelephone){0-1},("":TimeReportGroup),("":EmpExemptInd){0-1},("":PayCycleType){0-1}....
...
<EmployeeNumber>203688</EmployeeNumber>
<LastName>JONAS</LastName>
<FirstName>PAUL</FirstName>
<Department/>
<TerminationDate>        </TerminationDate>
<TelephoneNumber/>
<Sex>M</Sex>
<PhoneExtension/>
<WorkTelephone/>
<PayCycleType>WKLY</PayCycleType>
...
From the Schema
<xs:element ref="EmployeeNumber"/>
<xs:element ref="LastName" minOccurs="0"/>
<xs:element ref="FirstName" minOccurs="0"/>
<xs:element ref="Department" minOccurs="0"/>
<xs:element ref="WelderSymbol" minOccurs="0"/>
<xs:element ref="EmployeeRate" minOccurs="0"/>
<xs:element ref="TerminationDate" minOccurs="0"/>
<xs:element ref="TelephoneNumber" minOccurs="0"/>
<xs:element ref="Sex" minOccurs="0"/>
<xs:element ref="PhoneExtension" minOccurs="0"/>
<xs:element ref="WorkTelephone" minOccurs="0"/>
<xs:element ref="TimeReportGroup" minOccurs="0"/>
<xs:element ref="EmpExemptInd" minOccurs="0"/>
<xs:element ref="PayCycleType" minOccurs="0"/>

Neither xml has a tag for EmployeeRate, the first example error lists it as required:("":EmployeeRate), the second doesn't;  ("":EmployeeRate{0-1}),    The second example failed on TimeReportGroup, and the the error message indicates it is required on the second message, but not the first.    

The xml docs were processed within minutes of each other, with several identical (in form) xml docs processing successfully before, after and in between.   Both were successfully resubmitted and did not get the error.  The schema has not been changed for several months.

The parser is called like this:
        try {
            // Instantiate a parser
            XMLReader parser =
                XMLReaderFactory.createXMLReader(org.apache.xerces.parsers.SAXParser);

            // Register the content handler
            parser.setContentHandler(contentHandler);

            // Register the error handler
            parser.setErrorHandler(errorHandler);
            // Turn on validation
            parser.setFeature("http://xml.org/sax/features/validation", true);
            // Schema
            parser.setFeature("http://apache.org/xml/features/validation/schema", true);
            // Parse the document
            //sr is a StringReader
          InputSource is = new InputSource(sr);
          is.setSystemId(systemId);
          parser.parse(is);

This seems completely random to me...   Does anyone know how to stop this error?  

Thanks
0
Comment
Question by:BrentTemple
  • 5
  • 3
10 Comments
 
LVL 7

Expert Comment

by:J_Mak
ID: 12206407
With regards to the elements defined using the 'ref' attribute, are they done so such that their parent element is the <xs:schema> element. I'm just curious... they're probably not, but I just want to make sure, because they cannot be direct children of the <xs:schema> element lie so:

<xs:schema>
    <xs:element ref="EmployeeNumber"/>
    <xs:element ref="LastName" minOccurs="0"/>
    <xs:element ref="FirstName" minOccurs="0"/>
    <xs:element ref="Department" minOccurs="0"/>
    <xs:element ref="WelderSymbol" minOccurs="0"/>
    .........
</xs:schema>

What does it mean by invalid content sharing? Also, where how are the above elements defined elsewhere in the schema? I realise that they are being referenced in the above example. Cheers.
0
 
LVL 3

Author Comment

by:BrentTemple
ID: 12207346

More detail on how the schema works:  
The header of the Employee Schema includes a 'dictionary' type of schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
      <xs:include schemaLocation="docs/TheDictionary.xsd"/>
      <xs:element name="EmployeeDoc">
                .... all the refs in my original post reside within this structure.  (I've omitted some of the structure within EmployeeDoc)
      </xs:element>
</xs:schema>

This is the layout of the dictionary schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
...
      <xs:element name="EmployeeNumber">
            <xs:simpleType>
                  <xs:restriction base="xs:string">
                        <xs:maxLength value="12"/>
                  </xs:restriction>
            </xs:simpleType>
      </xs:element>
...
</xs:schema>

I don't see 'sharing', it's 'starting'.  (cvc-complex-type.2.4.a: Invalid content starting with element 'PayCycleType'. ) The error is saying that the tag PayCycleType is out of order, because I didn't provide TimeReportGroup first.   But TimeReportGroup isn't defined as required in the schema, nor is it required when validating 5000 other documents with the same tags.

Thanks
0
 
LVL 7

Expert Comment

by:J_Mak
ID: 12210149
Can ask you what schema element your references are under? That is, what is their parent node? Is it <xs:choice> or <xs:sequence>?

I'm assuming that it is <xs:sequence>, in which case you must, under any circumstances, provide all the defined elements in the correct order regardles of whether they contain any content or not.

Thanks.
0
 
LVL 3

Author Comment

by:BrentTemple
ID: 12210188
It is <xs:sequence>.   If I switched this for <xs:choice> would it have any other side-effects?  

Thanks
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 7

Expert Comment

by:J_Mak
ID: 12210455
If you switched it to <xs:choice> you must only have one of the elements present, only one. I was just asking for completeness.

I also noticed that you consistently use 'minOccurs=0'. I'm assuming that you can only have ONE FirstName and LastName elements in an EmployeeDoc element, correct? Does that go for the other elements? I was just asking because if that's the case, instead of using <xs:sequence> you can try using <xs:all> instead. Its allows the elements to be in any order, but they must appear only once each. Here is a link for more information:

http://www.w3schools.com/schema/el_all.asp

I'm not sure what effect that will have. Cheers.
0
 
LVL 3

Author Comment

by:BrentTemple
ID: 12212258
Thanks.  I looked up all, choice and sequence on the site J Mak gave me a link to.  

http://www.w3schools.com/schema/el_all.asp
The all element specifies that the child elements can appear in any order and that each child element can occur zero or one time.  

http://www.w3schools.com/schema/el_choice.asp
The choice element allows only one of the elements contained in the <choice> declaration to be present within the containing element

http://www.w3schools.com/schema/el_sequence.asp
The sequence element specifies that the child elements must appear in a sequence. Each child element can occur from 0 to any number of times.

If <xs:all> has a max limit of one, as it sounds like it does in the definition, it won't work for all our xsds. In EmployeeDoc I think it would.   But we also get this same error in other documents that have more complex structures which include some 'zero to many' or 'one to many' elements/nodes.

<xs:choice> won't work if it limits to only one.
-------------------------------------
To answer J Maks question:

Most of the elements allow zero or one.  A few don't have the 'MinOccurs=0' and that makes them required during schema validation.  In the example I'm debugging with (Employee) we don't have any zero to many elements.  But in some of the other documents that get the same 'random' failure, there are elements defined with a "maxOccurs=unbounded" to allow them to exist multiple times.

For example, if I submitted an xml document that was missing the EmployeeNumber (which doesn't have a minOccurs in the schema), I would get the same error that we see 'randomly'.

Here is the error if I submit a document with no EmployeeNumber element:
cvc-complex-type.2.4.a: Invalid content starting with element 'LastName'. The content must match '((EmployeeNumber),("":LastName){0-1},("":FirstName){0-1},("...

And in the text of the error message itself, it has a {0-1} following the elements with a minOccurs, and not after the EmployeeNumber...   If you go back to the original example in my Post above, it skips the {0-1} for an element that IS defined as minOccurs=0, and throws the exception for the next tag present following the missing tag.
...<xs:sequence>
<xs:element ref="EmployeeNumber"/>
<xs:element ref="LastName" minOccurs="0"/>...

My theory based on the weird (dis)appearance of the {0-1} in the error messages, is that occasionally the validator neglects to notice the minOccurs=0 when it is validating the xml.
On each of the failures, I can find an element that:
-- isn't in the xml
-- is prior to the one that the error failed on, (error message says ...starting with element 'LastName')
-- is defined in the error message without {0-1}
-- is defined in the schema as minOccurs=0.  

Thanks


0
 
LVL 3

Author Comment

by:BrentTemple
ID: 12457888
J_Mak;

Thanks for trying to help.   I'm still getting the error, but after quite a bit of searching through the apache user forum, I've found a few users who claim that the SAXParser class isn't 100% thread safe.  I'm going to try to sychronize the code that calls it, and see if that solves the problem.  I'm guessing that something in the SAXParser class crosses wires when more than one thread is using it at the same time.  

-Brent
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 12679312
PAQed with points refunded (500)

modulo
Community Support Moderator
0
 
LVL 3

Author Comment

by:BrentTemple
ID: 12793169
I've had the following mod in Production for 3 weeks and have not seen the error:

Added a new method:
   private static synchronized void parseIt(XMLReader parser, InputSource is) throws IOException, SAXException {
       parser.parse(is);
   }

Changed
parser.parse(is);
in the existing method, (see original question) to
parseIt(parser, is);

-Brent
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

I found this questions asking how to do this in many different forums, so I will describe here how to implement a solution using PHP and AJAX. The logical flow for the problem should be: Write an event handler for the first drop down box to get …
Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
The viewer will learn how to dynamically set the form action using jQuery.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now