LibXml2 doesn't recognize invalid xml

Why does LibXml2 validate standalone xml documents in which there are present attributes, which are delcared in an external dtd and which needs to be normalized in a way, that theyre normalized value would be differend from the value that would be produced in the absence of theyre delcatation?

Here is example xml:
<?xml version='1.0' standalone='yes'?>
<!DOCTYPE attributes SYSTEM "../valid/sa.dtd" [
]>

<attributes
token = "b"
nmtoken = " this-gets-normalized "
/>

Open in new window


and here is the DTD:
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT attributes EMPTY>
<!ATTLIST attributes
token (a|b|c) "a"
nmtoken NMTOKEN #IMPLIED
>

Open in new window


This xml and dtd originaly comes from W3C Conformance Test Suite. The xml state that it is standalone. Acording to xml specification (2.9 Standalone Document Declaration)

Validity constraint: Standalone Document Declaration

The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:

attributes with tokenized types, where the attribute appears in the document with a value such that normalization will produce a different value from that which would be produced in the absence of the declaration

Acording to uppermentioned statement, the xml which i posted upper must have standalone='no', becouse there is an external definition of the nmtoken attribute, which is of NMTOKEN type. Becouse of the nmtoken type, the normalized value of nmtoken attribute is the value without any leading and trailing space character and the normalized value would be "this-gets-normalized". But normalized value of nmtoken without presence of the nmtoken declaration in external dtd would be " this-gets-normalized ", becouse without any declaration, the type of nmtoken is CDATA and thus there would be no removing of leading and trailing spaces. So these two normalized values are not the same ant thus xml document must have standalone='no' in xml declaration. Becouse this is not true, the xml should be considered invalid.

Problem is, that libxml2 validate xml without any error. Im using LibXml2 version 2.9.1 on Windows with c++.

Please, do i missunderstood something or there is a bug in LibXml2?
PeterJanousekAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Geert BormansInformation ArchitectCommented:
I just tested with libxml, and you are right, this is a bug in libxml
By the way, libxml is not the only parser that actually does this wrong.

In short standalone='yes' tells the parser
"check if not sending the DTD would change the document one way or another"
some parser understand it as
"don't bother looking at the DTD at all"

You have entered a grey area of DTD validation
note that msxml does this right
so if this is C++ you are doing and this feature is important, you could use msxml instead
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
XML

From novice to tech pro — start learning today.