Link to home
Start Free TrialLog in
Avatar of mikeysmailbox1
mikeysmailbox1

asked on

XML file need to remove duplicate child nodes

Hi

I have a XML file that I need to remove the duplicate child nodes as I have thousands of duplicates in the file.

If can't delete the duplicates how can I sort for <OUTCOND and then NAME=   also <INCOND and then NAME=
Once they are sorted I can use unix utility "uniq" to delete the duplicate lines that are duplicated one after another.


Example:

<JOB  xxx=xxx ccc=ccc>
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>

needs to be

<JOB  xxx=xxx ccc=ccc>
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>


Not sure what the code would be using Perl module XML::Twig or XML::LibXML

Thanks

Mike
Avatar of zc2
zc2
Flag of United States of America image

You could try to use a grouping XSLT transformation, like:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:key match="OUTCOND|INCOND" name="COND" use="concat(name(),@NAME)"/>

  <xsl:template match="/">
    <xsl:apply-templates select="JOB"/>
  </xsl:template>
 
<xsl:template match="JOB">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="*[generate-id(.)=generate-id(key('COND', concat(name(),@NAME) ))]"/>
  </xsl:copy>
</xsl:template>  

<xsl:template match="*">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
  </xsl:copy>
</xsl:template>  

<xsl:template match="@*">
  <xsl:copy/>
</xsl:template>  

</xsl:stylesheet>

Open in new window

Avatar of mikeysmailbox1
mikeysmailbox1

ASKER

Hi zc2,

Not sure how to use the XSLT transformation. Is there a command to do that?

Thanks

Mike
I am not a Perl programmer, but a simple search finds this module: https://metacpan.org/pod/XML::XSLT 
See also this topic: https://stackoverflow.com/questions/156683/what-is-the-best-xslt-engine-for-perl
If that won't work, you could run an external command line processor - xsltproc on Linux or msxsl on Windows.
Hi zc2,

That almost worked. I do have the following

<?xml version="1.0"?>
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<SMART_FOLDER JOBISN="0">
<JOB  xxx="xxx" ccc="ccc">
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
</DEFTABLE>


What would I add to the XSLT file toads in the DEFTABLE and SMART_FOLDER nodes

Thanks

Mike
Please try this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:key match="OUTCOND|INCOND" name="COND" use="concat(name(),@NAME)"/>

<xsl:template match="JOB">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="*[generate-id(.)=generate-id(key('COND', concat(name(),@NAME) ))]"/>
  </xsl:copy>
</xsl:template>  

<xsl:template match="*">
  <xsl:copy>
    <xsl:apply-templates select="@* | *"/>
  </xsl:copy>
</xsl:template>  

<xsl:template match="@*">
  <xsl:copy/>
</xsl:template>  

</xsl:stylesheet>

Open in new window

Hi zc2

That worked for the first set but the second set it removed all of the INCOND,OUTCOND.

from the <SMART_FOLDER  ... > ... </SMART_FOLDER> is the different sets

Mike
I am sorry, what do you mean by the "set"? Different input files? Can you post them here?
It is a duplication of the first nodes

Example:

<?xml version="1.0"?>
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<SMART_FOLDER JOBISN="0">
<JOB  xxx="xxx" ccc="ccc">
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
<SMART_FOLDER JOBISN="0">
<JOB  xxx="xxx" ddd="ddd">
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
      <OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
</DEFTABLE>
The uniqueness was checked in the global scope. Now the parent element is taken into the account. I also added all the other attributes just in case.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:key match="OUTCOND|INCOND" name="COND" use="concat(generate-id(..),name(),@NAME,@ODATE,@SIGN,@AND_OR)"/>

<xsl:template match="JOB">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="*[generate-id(.)=generate-id(key('COND', concat(generate-id(..),name(),@NAME,@ODATE,@SIGN,@AND_OR) ))]"/>
  </xsl:copy>
</xsl:template>  

<xsl:template match="*">
  <xsl:copy>
    <xsl:apply-templates select="@* | *"/>
  </xsl:copy>
</xsl:template>  

<xsl:template match="@*">
  <xsl:copy/>
</xsl:template>  

</xsl:stylesheet>

Open in new window

This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.