• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 434
  • Last Modified:

JAXB XML Parsing Question

Is there any way to configure JAXB to, during parsing of an xml document, ignore any text following the closing tag? It seems to ignore any number of spaces following the closing tag, but even one non-space character causes it to throw a SAXParseException.  We have a few xml documents with characters following the closing tag, but otherwise they are fine. Until we can clean them up, it would be great to be able to throw a switch somewhere to say, when you reach the closing tag, forget anything that might be beyond it.  This exception is being thrown durng unmarshalling.  
0
whandley
Asked:
whandley
  • 3
  • 3
1 Solution
 
mrcoffee365Commented:
We have not found a way to do this.  What we do is scrub the xml before sending it to the parser.  Or in some cases, sending it to the parser, catching the exception, scrubbing, then sending it to the parser again.
0
 
HegemonCommented:
Please correct me if I am wrong, but it looks like illegitimate (non-whitespace) characters after closing tags make the document not well-formed, so, strictly speaking, it is no longer a valid XML document and cannot be processed by XML parser.

Either the document needs to be made valid XML by scrubbing it  or a non-XML parser used.

Problems of this sort can be expected when working with SGML documents that may look like XML but are not well formed.

0
 
mrcoffee365Commented:
Yes -- I already gave that answer.
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
HegemonCommented:
My point was not about scrubbing it per se but rather about the document not being an XML document, hence XML parsing not applicable.
0
 
mrcoffee365Commented:
XML docs come in many forms.   It's still an XML doc even if it has some characters in the file after the closing tag.  It is not a well-formed XML doc, which is what the asker was asking about.

As you get more experience with XML docs, you'll find that many are not well-formed, and the developers have to have strategies to deal with that.
0
 
HegemonCommented:
"Definition: A data object is an XML document if it is well-formed, as defined in this specification.", from here http://www.w3.org/TR/REC-xml/#sec-well-formed.

Hence not well formed - not an XML
0
 
Plk_In_EECommented:
Hi there
even if there gs a white space before the <xml tag in the document the sax parser will fail
better we send a well formatted xml to parser . open the xml in a browser to if its valid oNe Or not
good luck
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now