Link to home
Start Free TrialLog in
Avatar of SiJP
SiJP

asked on

Remove Empty Elements

In theory, I have this XML file (and be gentle, I'm not experienced with XML or XSLT).

<root>
   <data>
      <date>
         <start/>
         <end></end>
      </date>
      <time period="start">12:00</time>
      <time period="end">13:00</time>
   </data>
</root>

As you can see, there are a few elements that have no data.  What I'd like to do is clean up this XML with an XSLT, by getting rid of any elements that have no attributes or values.

Therefore the outputted xml would be:
<root>
   <data>
      <time period="start">12:00</time>
      <time period="end">13:00</time>
   </data>
</root>

Ideally, the <start/> element and <end></end> element would be ripped out.  And because the <date>.. element would subsequently have no child nodes or data, this would go to.

I've had one example given to me (below) but this not only removes all the blank elements, but also the attributes of the elements that do have values.

<xsl:template match="*[not(node())]"/>
<xsl:template match="node()">
  <xsl:copy>
    <xsl:apply-templates select="node()"/>
  </xsl:copy>
</xsl:template>

If this make's sense, how do I remove all empty elements ('simple' and 'complex', I guess) that have no values whatsoever?

Thanks


p.s - sorry about the lacking in points - it's all I have available!
Avatar of BobSiemens
BobSiemens

"As you can see, there are a few elements that have no data.  What I'd like to do is clean up this XML with an XSLT, by getting rid of any elements that have no attributes or values."

OR SUBNODES!

Start with the identity transformation:

<xsl:template match="node()|@*">
   <xsl:copy>
   <xsl:apply-templates select="@*"/>
   <xsl:apply-templates/>
   </xsl:copy>
 </xsl:template>

and add an xsl:if around the copy such that if all three conditions are true (no attributes/values/subnodes) then a copy will be done

xsl: if example:  <xsl:if test="(@author = 'bd') or (@year='1667')">




ASKER CERTIFIED SOLUTION
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
words on the script
----------------------
The heart of this piece of script is the XPath in the match of the first template.
That XPath matches any node that matches one of the following rules
- somewhere inside (descendant-or-self) there is an element with an attribute
- somewhere inside (descendant-or-self) there is an element with content that has a length of 1 or more, after white-space normalisation
I ignore all the other elements

The other two templates tell me that I can safely copy all text-nodes and all attributes

works also in this bordercase:
<?xml version="1.0" encoding="UTF-8"?>
<root><smooth><empty><series></series></empty></smooth></root>

What do you do yourself here?
----------------------------------
- I have not tested the behaviour with comments and PIs, you might need some twiddling
- I have not bothered about avoiding the creation of spurious white-space nodes

Well, you should have some fun yourself, no?
Good luck

Gertone
Well,

could not resist testing with PI and comments and it ignores them as expected
<empty><!-- my comment --></empty> is still an empty element...
so if that is what you want, you can leave it like that

If you add the strip-space element like this at the beginning
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:strip-space elements="*"/>
    <xsl:template
        match="node()[ descendant-or-self::*[@*] or  descendant-or-self::*[string-length(normalize-space(.)) &gt; 0]]">
        <xsl:copy>
all "pretty-print" white-space nodes are removed, you will get your XML on one line, which you can "pretty-print" again with XML-Spy or Oxygen.

One WARNING though, if you do that:
"<test>some<m>mixed </m>content and some mixed white-space<m> </m> </test>"
The second element <m> contains only a white-space (in mixed content) it will be removed since the white-space only node will be removed
so you will loose that.
One can argue that the white-space node after the second (mixed-content) element <m> is not to be thrown away, but it is removed as well,
as in  "<test>word <m>bold</m> <m>italic</m></test>" you will loose the space between the two (mixed content) elements <m>
Bottomline: If you have a lot of mixed content, be careful with the strip-space,
or include mixed-content elements in an <xsl:preserve-space elements="test firstMixed m"/> if you know who they are.

And now I should shut up :-)
Avatar of SiJP

ASKER

Ah Gertone, no need to shut up by any means - this is excellant information! When I get back in to me office, I shall be having fun with your xslt!
Avatar of SiJP

ASKER

Gertone,

What script changes would be needed to also exclude elements like:

<SomeElement myAtt=""></SomeElement>

(e.g. elemts that have a blank attribute as well as no data)?

Thanks!
Avatar of SiJP

ASKER

Oh, and also...

<SomeElement myAtt=""/>


:)
You are a real challenger hé!? :-)

Just change the XPath in
        match="node()[ descendant-or-self::*[@* != ''] or  descendant-or-self::*[string-length(normalize-space(.)) &gt; 0]]"
and it should work
It might not be clear from the copy,
but I changed the [@*] into [@* != '']
and that is 2 single quotes, not one double quote
So I check that any attribute is not the empty string, and in the meantime it gives a false for none existing attributes as well
Avatar of SiJP

ASKER

Clearly not a challenge that is too much for you!

Thank you Gertone..  I will try this again and let you know of my results

Si
Avatar of SiJP

ASKER

My friend, this has helped me enormously, thank you!

Final XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    <xsl:template
        match="node()[ descendant-or-self::*[@* != ''] or  descendant-or-self::*[string-length(normalize-space(.)) &gt; 0]]">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>    
    </xsl:template>
    <xsl:template match="text()">
        <xsl:value-of select="."/>
    </xsl:template>
    <xsl:template match="@*">
        <xsl:attribute name="{name()}"><xsl:value-of select="."/>
        </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>
You are welcome