Link to home
Start Free TrialLog in
Avatar of HairyDogDigital
HairyDogDigitalFlag for United States of America

asked on

Extracting subset of XML data using XPATH

Hi all,

I've been giving myself a crash course on XPath, but I'm still having some difficulty constructing some XPath expressions to best fit my needs -- assuming that it can be done.

I'm working on a project that is a database of different document types for various parts/products. The structure of the XML is...

<family ID="famID" label="">
      <group ID="grpID" label="">
            <part ID="ABCDE" type="" partNo="" description="">
                  <document type="A" label="" file="ABCDE-A.pdf"/>
                  <document type="B" label="" file=""/>
                  <document type="C" label="" file=""/>
                  <document type="D" label="" file="ABCDE-docD.pdf"/>
                  <document type="E" label="" file="installation.pdf"/>
                  <document type="F" label="" file=""/>
                  <document type="G" label="" file=""/>
            </part>
            <part ID="XYZ" type="" partNo="" description="">
                  <document type="A" label="" file="XYzee.pdf"/>
                  <document type="B" label="" file=""/>
                  <document type="C" label="" file=""/>
                  <document type="D" label="" file="ex2why.doc"/>
                  <document type="E" label="" file="installation.pdf"/>
                  <document type="F" label="" file=""/>
                  <document type="G" label="" file=""/>
            </part>
      </group>
</family>      

There is no text in the elements. All data is presented as values of attributes.

What I am trying to achieve is generating a new XML object based on specific ID criteria and file attributes of document elements not being null.

For example, if I need all existing documents (file != "") of parts that have "CD" in their ID, the output XML object should be:

<family ID="" label="">
      <group ID="" label="">
            <part ID="ABCDE" type="" partNo="" description="">
                  <document type="A" label="" file="ABCDE-A.pdf"/>
                  <document type="D" label="" file="ABCDE-docD.pdf"/>
                  <document type="E" label="" file="installation.pdf"/>
            </part>
      </group>
</family>      

I can select the part nodes based on criteria by using -- //part [contains(@ID,\'CD\')] -- which spits back an array of found <part> elements and included child nodes. And, I can select documents where the file attribute is not null by using -- //document [@file != ''] -- which spits out the <document> elements. But, is it possible to get back the full hierarchy based on selection criteria without using XSLT?

...Rob
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

> But, is it possible to get back the full hierarchy based on selection criteria without using XSLT?

no, you can't (well you can use other programming techniques such as DOM)
but XPath is meant for selection (addressing) of nodes
If you need to recreate a slimmed down version of your XML,
you need to fit your XPath in an XSLT or XQuery

cheers

Geert
Avatar of HairyDogDigital

ASKER

Okay, so it won't work with just XPath. Not sure if XSLT or XQuery is an option, because the XPATH implementation is in Flash. That does leave me the possible option of working through it via the DOM.

However, is it possible to select just an ELEMENT? Getting back to my example, if I pull the PART elements that I need, can I get JUST the GROUP element for a part, without having all of the child/descendant nodes in tow with it?

...Rob
ASKER CERTIFIED SOLUTION
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Bummer!

I don't know of any XSLT extensions for Flash, which is unfortunate because the "end" result is actually an XHTML document (as a string) that is loaded into an embedded text field. So XSLT would be ideal.

Since the XML has a very tight structure and does not have any text nodes (only element and attribute nodes), it's not that difficult to read parent/child/sibling node names and attribute name/values to created the XHTML.

Yes, more of a hassle than XSLT. And before I go the DOM route, I'm going to search further for an XSLT implementation for Flash. Since there is a fairly robust XPath and XPath 2 implementation that an XSLT should also exist.

Thanks for the quick response. At least I know that I'm not going to get what I need purely from XPath.

...Rob
I wasn't aware there is an XPath2 implementation
I would be very surprised if there isn't an XSLT1 then

cheers

This is a "good news" / "bad news" thing.

First, the XPath implementation for Flash is a third-party set of classes. The good news is that it exists, though I might have read a specification incorrectly as to whether it is XPath or XPath 2.
 
The bad news, no XSLT for Flash... as of yet.

Fortunately, the amount of data I am dealing with does not cause a remarkable performance hit when traversing parent, grandparent, and child nodes to generate the desired output.

Again, thanks for the quick response. You saved me hours of plunking around on Google!

...Rob
welcome
sorry for the bad news

Geert