mikeysmailbox1
asked on
XML file need to remove duplicate child nodes
Hi
I have a XML file that I need to remove the duplicate child nodes as I have thousands of duplicates in the file.
If can't delete the duplicates how can I sort for <OUTCOND and then NAME= also <INCOND and then NAME=
Once they are sorted I can use unix utility "uniq" to delete the duplicate lines that are duplicated one after another.
Example:
<JOB xxx=xxx ccc=ccc>
<OUTCOND NAME="F3440CST_ZSSFR910--F 3050CST_ZM KSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F30 50CST_ZMKS ADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F30 50CST_ZMKS ADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F 3050CST_ZM KSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F30 50CST_ZMKS ADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F30 50CST_ZMKS ADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F30 50CST_ZMKS ADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F30 50CST_ZMKS ADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F 3050CST_ZM KSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F30 50CST_ZMKS ADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F30 50CST_ZMKS ADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F 3050CST_ZM KSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F 3050CST_ZM KSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
needs to be
<JOB xxx=xxx ccc=ccc>
<OUTCOND NAME="F3440CST_ZSSFR910--F 3050CST_ZM KSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F30 50CST_ZMKS ADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F30 50CST_ZMKS ADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F30 50CST_ZMKS ADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F30 50CST_ZMKS ADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F 3050CST_ZM KSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F 3050CST_ZM KSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
Not sure what the code would be using Perl module XML::Twig or XML::LibXML
Thanks
Mike
I have a XML file that I need to remove the duplicate child nodes as I have thousands of duplicates in the file.
If can't delete the duplicates how can I sort for <OUTCOND and then NAME= also <INCOND and then NAME=
Once they are sorted I can use unix utility "uniq" to delete the duplicate lines that are duplicated one after another.
Example:
<JOB xxx=xxx ccc=ccc>
<OUTCOND NAME="F3440CST_ZSSFR910--F
<OUTCOND NAME="F2187CST_ZD4S8X--F30
<OUTCOND NAME="F2187CST_ZD4S51--F30
<OUTCOND NAME="F3440CST_ZSSFR910--F
<OUTCOND NAME="F2187CST_ZD4S8X--F30
<OUTCOND NAME="F2187CST_ZD4S51--F30
<INCOND NAME="F2187CST_ZD4S51--F30
<INCOND NAME="F2187CST_ZD4S8X--F30
<INCOND NAME="F3440CST_ZSSFR910--F
<INCOND NAME="F2187CST_ZD4S51--F30
<INCOND NAME="F2187CST_ZD4S8X--F30
<INCOND NAME="F3440CST_ZSSFR910--F
<OUTCOND NAME="F3050CST_ZMKSADS1--F
</JOB>
needs to be
<JOB xxx=xxx ccc=ccc>
<OUTCOND NAME="F3440CST_ZSSFR910--F
<OUTCOND NAME="F2187CST_ZD4S8X--F30
<OUTCOND NAME="F2187CST_ZD4S51--F30
<INCOND NAME="F2187CST_ZD4S51--F30
<INCOND NAME="F2187CST_ZD4S8X--F30
<INCOND NAME="F3440CST_ZSSFR910--F
<OUTCOND NAME="F3050CST_ZMKSADS1--F
</JOB>
Not sure what the code would be using Perl module XML::Twig or XML::LibXML
Thanks
Mike
ASKER
Hi zc2,
Not sure how to use the XSLT transformation. Is there a command to do that?
Thanks
Mike
I am not a Perl programmer, but a simple search finds this module: https://metacpan.org/pod/XML::XSLT
See also this topic: https://stackoverflow.com/questions/156683/what-is-the-best-xslt-engine-for-perl
If that won't work, you could run an external command line processor - xsltproc on Linux or msxsl on Windows.
See also this topic: https://stackoverflow.com/questions/156683/what-is-the-best-xslt-engine-for-perl
If that won't work, you could run an external command line processor - xsltproc on Linux or msxsl on Windows.
ASKER
Hi zc2,
That almost worked. I do have the following
<?xml version="1.0"?>
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<SMART_FOLDER JOBISN="0">
<JOB xxx="xxx" ccc="ccc">
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
</DEFTABLE>
What would I add to the XSLT file toads in the DEFTABLE and SMART_FOLDER nodes
Thanks
Mike
That almost worked. I do have the following
<?xml version="1.0"?>
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<SMART_FOLDER JOBISN="0">
<JOB xxx="xxx" ccc="ccc">
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
</DEFTABLE>
What would I add to the XSLT file toads in the DEFTABLE and SMART_FOLDER nodes
Thanks
Mike
Please try this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:key match="OUTCOND|INCOND" name="COND" use="concat(name(),@NAME)"/>
<xsl:template match="JOB">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="*[generate-id(.)=generate-id(key('COND', concat(name(),@NAME) ))]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@* | *"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
ASKER
Hi zc2
That worked for the first set but the second set it removed all of the INCOND,OUTCOND.
from the <SMART_FOLDER ... > ... </SMART_FOLDER> is the different sets
Mike
That worked for the first set but the second set it removed all of the INCOND,OUTCOND.
from the <SMART_FOLDER ... > ... </SMART_FOLDER> is the different sets
Mike
I am sorry, what do you mean by the "set"? Different input files? Can you post them here?
ASKER
It is a duplication of the first nodes
Example:
<?xml version="1.0"?>
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<SMART_FOLDER JOBISN="0">
<JOB xxx="xxx" ccc="ccc">
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
<SMART_FOLDER JOBISN="0">
<JOB xxx="xxx" ddd="ddd">
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
</DEFTABLE>
Example:
<?xml version="1.0"?>
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<SMART_FOLDER JOBISN="0">
<JOB xxx="xxx" ccc="ccc">
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
<SMART_FOLDER JOBISN="0">
<JOB xxx="xxx" ddd="ddd">
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<OUTCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" SIGN="-"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S51--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F2187CST_ZD4S8X--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<INCOND NAME="F3440CST_ZSSFR910--F3050CST_ZMKSADS1" ODATE="ODAT" AND_OR="A"/>
<OUTCOND NAME="F3050CST_ZMKSADS1--F3050CST_ZMKSADS2" ODATE="ODAT" SIGN="+"/>
</JOB>
</SMART_FOLDER>
</DEFTABLE>
The uniqueness was checked in the global scope. Now the parent element is taken into the account. I also added all the other attributes just in case.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:key match="OUTCOND|INCOND" name="COND" use="concat(generate-id(..),name(),@NAME,@ODATE,@SIGN,@AND_OR)"/>
<xsl:template match="JOB">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="*[generate-id(.)=generate-id(key('COND', concat(generate-id(..),name(),@NAME,@ODATE,@SIGN,@AND_OR) ))]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@* | *"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
This question needs an answer!
Become an EE member today
7 DAY FREE TRIALMembers can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Open in new window