XML with parsers stax, jaxb, sax, dom

Hi,

when we use XML  with parsers like stax, jaxb, sax, dom.

Which practical scenario or applications we use different parsers and advantages, disadvantages of each of them.

Please advies
LVL 7
gudii9Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mccarlIT Business Systems Analyst / Software DeveloperCommented:
Sax/Stax - Are similar to each other and basically provide a stream view of the processed XML document. This is good when you typically have a large document that is basically like a list of relative unrelated items. eg. an XML document that contains a list of millions of financial transactions. In this case, each transaction has little relation to any other in the document, so you can use SAX to get the first transaction, then your code can process/store it and then forget about and move on to the next transaction. This is good because no matter how big the incoming file is, you only use memory for the one transaction that you are currently processing.

DOM - parses and loaded the entire document into memory first, and then allows you to query/access the representation in memory. This is good when you need to randomly and/or repeatedly access different elements within the document. If you need to run xpath queries (particularly if you need to run multiple) over the document, having the entire document parsed and in memory, makes this faster.

JAXB - you can think of as an extension to DOM. It still loads the entire document into memory, but the representation is different. DOM stores the document as a tree of elements, child elements and attributes, etc. Where as JAXB takes that one step further and represents the data as a graph of Java objects.


You can probably see that the advantage of one is the disadvantage of the other. ie. DOM/JAXB is not good for handling large documents as it takes up memory (something to consider especially if the size of incoming documents to parse is unknown/out of your control), SAX/STAX is not good for randomly/repeatedly accessing various different elements in the document, ie. they are sequential parsers.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gudii9Author Commented:
If you need to run xpath queries (particularly if you need to run multiple) over the document, having the entire document parsed and in memory, makes this faster.

if we use xpath in the project we cannot use SAX or STAX right only DOM or JAXB is good choice in that case?
please advise
0
gudii9Author Commented:
can Stax do subprocessing of XML (extracting element name, then the attributes, and then  content)where as the SAX cannot do it? please advise.
Any good sample examples on each of these. please advise
0
mccarlIT Business Systems Analyst / Software DeveloperCommented:
if we use xpath in the project we cannot use SAX or STAX right only DOM or JAXB is good choice in that case?
You can still use a combination of different parsers within a project but yes, for a particular document that you want to use xpath on you need a DOM representation.

can Stax do subprocessing of XML (extracting element name, then the attributes, and then  content)where as the SAX cannot do it?
I don't know exactly what you mean here. But for a good explanation of SAX/StAX have a look here... http://tutorials.jenkov.com/java-xml/sax-vs-stax.html
0
gudii9Author Commented:
Java SAX vs. StAX

 
By Jakob Jenkov  Connect with me:
Rate article: Share article: Tweet

Table of Contents
The SAX Push Model
The StAX Pull Model
Summary of Advantages and Disadvantages
StAX Allows Subparsing / Delegation
StAX has Support for XML Writing
StAX has NO Support for Schema Validation
Both SAX and StAX are stream / event oriented XML parsers, but there is a subtle difference in how they work. SAX uses a "push" model, and StAX uses a "pull" model. To the unknowing this can be confusing. Therefore I will try to address the differences in these models in a little more detail in this text.

Should you know of any advantages or disadvantages that I have forgotten here, please feel free to send me an email. You can find a working email address on my About page.

The SAX Push Model

The SAX push model means that it is the SAX parser that calls your handler, not your handler that calls the SAX parser. The SAX parser thus "pushes" events into your handler. Here it is, summarized:

SAX Parser --> Handler    
With a push model you have not control over how and when the parser iterates over the file. Once you start the parser, it iterates all the way until the end, calling your handler for each and every XML event in the input XML document.

The StAX Pull Model

The StAX pull model means that it is your "handler" class that calls the parser, not the other way around. Thus your handler class controls when the parser is to move on to the next event in the input. In other words, your handler "pulls" the XML events out of the parser. Additionally, you can stop the parsing at any point. The pull model is summarized like this:

Handler --> StAX Parser    
Summary of Advantages and Disadvantages

The StAX pull model has a few advantages over the SAX push model (one of the few cases where "inversion of control" is not an advantage). I have summarized the positives and negatives of SAX and StAX in the table below:

SAX +      SAX -      StAX +      StAX -
+ Schema Validation            + Subparsing / Delegation possible
+ Support for XML Writing      - No Schema Validation
StAX Allows Subparsing / Delegation

One big advantage of StAX over SAX is that the pull model allows subparsing of the XML input by methods and components. What do I mean by that?

First, here is an XML example:

<transportInfo>
    <driver>...</driver>
    <driver>...</driver>

    <vehicle>...</vehicle>
    <vehicle>...</vehicle>

</transportInfo>
Second, look at this StAX StreamReader example:


public void parse(){
    XMLStreamReader streamReader = factory.createXMLStreamReader(
        new FileReader("data\\test.xml"));

    while(streamReader.hasNext()){
        streamReader.next();

        if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
            String elementName = streamReader.getLocalName();
            if("driver".equals(elementName)){
                parseDriverAndAllChildren(streamReader);
            } else if("vehicle".equals(elementName)) {
                parseVehicleAndAllChildren(streamReader);
            }
        }
    }
}

public void parseDriverAndAllChildren(XMLStreamReader streamReader) {
    while(streamReader.hasNext()){
        streamReader.next();

        if(streamReader.getEventType() == XMLStreamReader.END_ELEMENT){
            String elementName = streamReader.getLocalName();
            if("driver".equals(elementName)){
              return;
            }
        } else if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
           //do something to child elements...
        }

    }
}

public void parseVehicleAndAllChildren(XMLStreamReader streamReader) {
    while(streamReader.hasNext()){
        streamReader.next();

        if(streamReader.getEventType() == XMLStreamReader.END_ELEMENT){
            String elementName = streamReader.getLocalName();
            if("vehicle".equals(elementName)){
              return;
            }
        } else if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
           //do something to child elements...
        }
    }
}
Notice how each of the methods parseDriverAndAllChildren() and parseVehicleAndAllChildren() are capable of continuing the parsing loop (while(streamReader.hasNext() {... } and process all elements related to the "driver" / "vehicle" element of their respective interest.

If you were to do this using a SAX handler, things would have become ugly. You would have had to set a flag inside the handler object to tell what element you were inside. Delegating the parsing and handling of sub-parts of the XML document to a method or component, would not be easily possible. Not as easy as it is shown above.

above explanation answers my question
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
XML

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.