• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 505
  • Last Modified:

Need help to slightly modify an existing XML document using either XSLT or Java

Dear fellow Java/XML developers:

I have an xml file which I need to slightly modify, using either XSLT, or Java, however, I am not sure how to do this.  The current document structure is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<collection name="Name of Collection">
    <book number="1" title="Book Title">
        <chapter number="1:Chapter Title">
            <quote number="1.0001" reference="Book 1, Number 1">
                <narrator></narrator>
                <quotation>
                    <quotation-text></quotation-text>
                    <quotation-footnote></quotation-footnote>
                </quotation>
            </quote>
            ...
        </chapter>
        <chapter number="2:Chapter Title">
             <quote number="1.0005" reference="Book 1, Number 5">
                <narrator></narrator>
                <quotation>
                    <quotation-text></quotation-text>
                    <quotation-footnote></quotation-footnote>
                </quotation>
            </quote>
         ...
     </book>
     <book number="2" title="Book Title">
         <chapter number="1:Chapter Title">
             <quote number="2.0025" reference="Book 2, Number 25">
                <narrator></narrator>
                <quotation>
                    <quotation-text></quotation-text>
                    <quotation-footnote></quotation-footnote>
                </quotation>
            </quote>
      ....
</collection>

The changes I need to make are:

1.  add a "title" attribute to the <chapter> element by breaking up the current "number" attribute, such that:

 <chapter number="1:Chapter Title">

changes to :

<chapter number="1" title="Chapter Title"> (and have the colon removed in the process).

2.  add the value of the "number" attribute from the <chapter> element, to the "reference" attribute of the  <quote> element, and modify the "number" attribute of the <quote> element, such that:

<chapter number="1:Chapter Title">
<quote number="1.0001" reference="Book 1, Number 1">

changes to:

<quote number="1" reference="Book 1, Chapter 1, Number 1">

As of right now, the way the "number" attribute of the <quote> element works, is that it is made up of the "Book" number, followed by the "quote" number, separated by a period in between.  I want to eliminate the book number (the initial number, the period, and all of the leading zeroes in front of the quote number, so that ONLY the quote number remains.  I hope this makes sense.

The parent tag of the xml document, is <collection>.  <collection> contains several <book> elements, which contain several <chapter> elements, and each <chapter> element, contains several <quote> elements.

I hope this question is not too complicated, and if it is, please let me know which part is confusing, and I will do my best to further clarify.

My sincerest thanks to all who reply.
0
fsyed
Asked:
fsyed
  • 9
  • 2
6 Solutions
 
Geert BormansCommented:
This does your first task,
It is an identity copy, slightly changed.
So it copies your XML completely, except that it does something special for the chapter element
I will now add a new template for the qute thing
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="chapter">
        <xsl:copy>
            <xsl:copy-of select="@*[not(name() = 'number')]"/>
            <xsl:attribute name="number">
                <xsl:value-of select="substring-before(@number, ':')"/>
            </xsl:attribute>
            <xsl:attribute name="title">
                <xsl:value-of select="substring-after(@number, ':')"/>
            </xsl:attribute>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
Geert BormansCommented:
Here is the full stylesheet, doing it all.
I am happy to explain what is going on, if it is not too clear
cheers

Geert
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="chapter">
        <xsl:copy>
            <xsl:copy-of select="@*[not(name() = 'number')]"/>
            <xsl:attribute name="number">
                <xsl:value-of select="substring-before(@number, ':')"/>
            </xsl:attribute>
            <xsl:attribute name="title">
                <xsl:value-of select="substring-after(@number, ':')"/>
            </xsl:attribute>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="quote">
        <xsl:copy>
            <xsl:copy-of select="@*[not(name() = 'number')][not(name() = 'reference')]"/>
            <xsl:attribute name="number">
                <xsl:value-of select="substring-before(@number, '.')"/>
            </xsl:attribute>
            <xsl:attribute name="reference">
                <xsl:value-of select="substring-before(@reference, ',')"/>
                <xsl:text>, Chapter </xsl:text>
                <xsl:value-of select="substring-before(ancestor::chapter/@number, ':')"/>
                <xsl:text>,</xsl:text>
                <xsl:value-of select="substring-after(@reference, ',')"/>
            </xsl:attribute>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
Geert BormansCommented:
Oh, I strongly believe that a task like this is best done in XSLT
Since you are copying most of the document anyway, XSLT gives you the tree walker for free
I guess this code is much simpler than any java alternative could be
You know of course that you can run the XSLT from java?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
fsyedAuthor Commented:
Dear Gertone:

Thanks (yet again!) for an amazing solution!  Unfortunately, there is one slight error in the output, and that is the value of the attribute "number" in the <quote> element is the book value.  I need this number, the period, and the leading zeroes removed.

The fault is actually mine when I look at the example I provided above.  Here is a clearer example:

<chapter number="3:Chapter Title">
<quote number="1.0002" reference="Book 1, Number 2">

changes to:

<quote number="2" reference="Book 1, Chapter 3, Number 2">

in the number attribute above, 1 refers to the book number, and the 0002 refers to the quote number (which is what I need to keep, minus the zeroes).  

I hope this helps, and thanks again for so quick of a response.

Also, can you show me how to run the XSLT from java?

Thanks again.
Sincerely;
Fayyaz
0
 
Geert BormansCommented:
That is only a small change,
take the part after the '.' instead of the part before the "."
and cast to number, so the leading zeroes will drop

I am not a java programmer, but there are loads of examples on the web, I will now google for a good one

cheers

Geert
...
    <xsl:template match="quote">
        <xsl:copy>
            <xsl:copy-of select="@*[not(name() = 'number')][not(name() = 'reference')]"/>
            <xsl:attribute name="number">
                <xsl:value-of select="number(substring-after(@number, '.'))"/>
            </xsl:attribute>
...

Open in new window

0
 
Geert BormansCommented:
this is old, but on a first browse, seems still relevant enough.
The default parser in Java has changed, but that should not be a problem here
http://www.ling.helsinki.fi/kit/2004k/ctl257/JavaXSLT/Ch05.html
0
 
Geert BormansCommented:
maybe this is all you need
Source xmlSource = new StreamSource("file.xml");
Source xsltSource = new StreamSource("file.xsl");
Result result = new StreamResult("file2.xml");
 
// create an instance of TransformerFactory
TransformerFactory transFact = TransformerFactory.newInstance(  );
 Transformer trans = transFact.newTransformer(xsltSource);
 trans.transform(xmlSource, result);

Open in new window

0
 
fsyedAuthor Commented:
As usual Gertone, your answers are truly outstanding, and are always delivered immediately.  I truly appreciate all the work you have done.  I was wondering if you could provide me a breakdown of the revised, complete XSLT sheet to explain to me what is happening?  This problem was tricky for me which is why I posted my problem, and as I suspected, your solution is quite involved.  You truly are a genius!

Thanks again for everything.
Sincerely;
Fayyaz
0
 
Geert BormansCommented:
Here is an explanation, I will break this down in multiple posts, so that I have the explanation next to the code snippet pane.

If you only make small changes to an XML source document, you usually start with a so called identity copy stylesheet
That is a stylesheet with one template, as below (variants do exist) that makes the output an identical copy of the input

Each node will receive the following treatment
- xsl:copy copies the current node to the output... that is a copy without the children... for a text() node this would copy the text, for an element, it would copy the start and end tag, for a comment, this would copy the comment... note that the template operates on all type of nodes
- inside the xsl:copy you need to do something with the children
  + all attributes are copied as is
  + all child nodes are pushed to the templates as well... since there is only one template, the same copying occurs on the nested levels

I hope this makes the identity copy clear
   <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

Open in new window

0
 
Geert BormansCommented:
For matching nodes to a template, the most specific match statement wins
Allthough the above template matches the chapter element node
the
<xsl:template match="chapter">... is more specific for the chapter element

So adding an extra template for a specific element to the indentity transformation stylesheet,
will still transform the input indentical to the output, except for this one specific element

Here is what we do with chapter elements
- we copy all their attributes, except the @number
- we create an attribute number, with the value being a part of the original number attribute
- we create an attribute title, with the value being another part of the original @number
- and then we process the child nodes in exactly the same fashion as before
    <xsl:template match="chapter">
        <xsl:copy>
            <xsl:copy-of select="@*[not(name() = 'number')]"/>
            <xsl:attribute name="number">
                <xsl:value-of select="substring-before(@number, ':')"/>
            </xsl:attribute>
            <xsl:attribute name="title">
                <xsl:value-of select="substring-after(@number, ':')"/>
            </xsl:attribute>
            <xsl:apply-templates select="node()"></xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

Open in new window

0
 
Geert BormansCommented:
Basically the next template for quote, does some variants to the chapter template, nothing really new,
except that we need to check out the attribute number of an ancestor

Let me know if there are still unclarities at this point

cheers

Geert
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 9
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now