We help IT Professionals succeed at work.

Remove duplicates with XSLT 1.0

Peter_W_S
Peter_W_S asked
on
Hi,

I have a problem with duplicates within my xml file that I need to have removed. I'm limited to using XSLT 1.0.

Consider having the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfIntegrationImportData 
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>def220</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>ght342</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


As you can see there are duplicate enterprise:IntegrationSupplierItem within the enterprise:SupplierItems. I need these duplicates removed. The ExternalSystemId can be seen as the key of each of the enterprise:IntegrationSupplierItem, so the result should be:

<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfIntegrationImportData 
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>def220</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>ght342</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


I'm pretty sure Muenchian grouping is the way to go - I just can't figure out the exact implementation that gives me the wanted result. Any input would be much appreciated.
Comment
Watch Question

Gertone (Geert Bormans)Information Architect
Top Expert 2006

Commented:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:strip-space elements="*"/>
	<xsl:key name="eis" match="enterprise:IntegrationSupplierItem" use="ExternalSystemId"/>
	<xsl:template match="node()">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="node()"/>
		</xsl:copy>
	</xsl:template>
	<xsl:template match="enterprise:SupplierItems">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="enterprise:IntegrationSupplierItem[generate-id() = generate-id(key('eis', ExternalSystemId)[1])]"/>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

Open in new window

Gertone (Geert Bormans)Information Architect
Top Expert 2006

Commented:
That is using Muenchian grouping indeed
(though not having the classic for-each in it, but reusing the identity transform template)

There is one catch here, for this solution to work
a specific ExternalSystemId can only be repeated inside its IntegrationImportData
there is no repetition of teh ID accross IntegrationImportData,
if that could happen, we need a compound key
Gertone (Geert Bormans)Information Architect
Top Expert 2006

Commented:
solution with compound key

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:strip-space elements="*"/>
	<xsl:key name="eis" match="enterprise:IntegrationSupplierItem" use="concat(generate-id(parent::*), '--', ExternalSystemId)"/>
	<xsl:template match="node()">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="node()"/>
		</xsl:copy>
	</xsl:template>
	<xsl:template match="enterprise:SupplierItems">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="enterprise:IntegrationSupplierItem[generate-id() = generate-id(key('eis', concat(generate-id(parent::*), '--', ExternalSystemId))[1])]"/>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

Open in new window

Author

Commented:
Thank you for your input Geert Bormans, especially on the compound key even if it's not needed in this case. Your first respons is similar to what I tried priviously. and it does indead work on my muck XML file. Turns out my problem is that the real XML has a namespace that I've overlooked. So the real file is like:

<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfIntegrationImportData xmlns="http://schemas.datacontract.org/2004/07/Elite.Core.Integrations"
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>def220</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>ght342</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


the namespace make the output become:

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfIntegrationImportData xmlns="http://schemas.datacontract.org/2004/07/Elite.Core.Integrations" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems />
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems />
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


lacking all the  enterprise:SupplierItems/enterprise:IntegrationSupplierItem
Information Architect
Top Expert 2006
Commented:
aha, you can't develop XSLTs based on poor input :-)

Just add the default namespace and bind the prefix to it

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations"
xmlns:eci="http://schemas.datacontract.org/2004/07/Elite.Core.Integrations">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:strip-space elements="*"/>
	<xsl:key name="eis" match="enterprise:IntegrationSupplierItem" use="concat(generate-id(parent::*), '--', eci:ExternalSystemId)"/>
	<xsl:template match="node()">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="node()"/>
		</xsl:copy>
	</xsl:template>
	<xsl:template match="enterprise:SupplierItems">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="enterprise:IntegrationSupplierItem[generate-id() = generate-id(key('eis', concat(generate-id(parent::*), '--', eci:ExternalSystemId))[1])]"/>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

Open in new window

Author

Commented:
Thanks Geert Bormans, it works like a charm.

And you are right, poor input => poor output - and that goes for most things in life :-)
Gertone (Geert Bormans)Information Architect
Top Expert 2006

Commented:
welcome.

If you had not encountered the prefix binding in XSLT for default namespace in source before,
that is one to remember
Usualy one would use the default namespace in XSLT for the output tree.
That implies you need a different soultion for the source default namespace in the XSLT
You will need this approach many many times when you deal with XSLT and namespaces

Author

Commented:
Thanks to Geert Bormans for comming back and follow up on later provided information, even if giving a perfect solution to wrongfully data given in first place.