Remove duplicates with XSLT 1.0

Hi,

I have a problem with duplicates within my xml file that I need to have removed. I'm limited to using XSLT 1.0.

Consider having the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfIntegrationImportData 
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>def220</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>ght342</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


As you can see there are duplicate enterprise:IntegrationSupplierItem within the enterprise:SupplierItems. I need these duplicates removed. The ExternalSystemId can be seen as the key of each of the enterprise:IntegrationSupplierItem, so the result should be:

<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfIntegrationImportData 
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>def220</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>ght342</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


I'm pretty sure Muenchian grouping is the way to go - I just can't figure out the exact implementation that gives me the wanted result. Any input would be much appreciated.
Peter_W_SAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Geert BormansInformation ArchitectCommented:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:strip-space elements="*"/>
	<xsl:key name="eis" match="enterprise:IntegrationSupplierItem" use="ExternalSystemId"/>
	<xsl:template match="node()">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="node()"/>
		</xsl:copy>
	</xsl:template>
	<xsl:template match="enterprise:SupplierItems">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="enterprise:IntegrationSupplierItem[generate-id() = generate-id(key('eis', ExternalSystemId)[1])]"/>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

Open in new window

0
Geert BormansInformation ArchitectCommented:
That is using Muenchian grouping indeed
(though not having the classic for-each in it, but reusing the identity transform template)

There is one catch here, for this solution to work
a specific ExternalSystemId can only be repeated inside its IntegrationImportData
there is no repetition of teh ID accross IntegrationImportData,
if that could happen, we need a compound key
0
Geert BormansInformation ArchitectCommented:
solution with compound key

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:strip-space elements="*"/>
	<xsl:key name="eis" match="enterprise:IntegrationSupplierItem" use="concat(generate-id(parent::*), '--', ExternalSystemId)"/>
	<xsl:template match="node()">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="node()"/>
		</xsl:copy>
	</xsl:template>
	<xsl:template match="enterprise:SupplierItems">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="enterprise:IntegrationSupplierItem[generate-id() = generate-id(key('eis', concat(generate-id(parent::*), '--', ExternalSystemId))[1])]"/>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

Open in new window

0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Peter_W_SAuthor Commented:
Thank you for your input Geert Bormans, especially on the compound key even if it's not needed in this case. Your first respons is similar to what I tried priviously. and it does indead work on my muck XML file. Turns out my problem is that the real XML has a namespace that I've overlooked. So the real file is like:

<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfIntegrationImportData xmlns="http://schemas.datacontract.org/2004/07/Elite.Core.Integrations"
  xmlns:fo="http://www.w3.org/1999/XSL/Format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:i="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>def220</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>abc110</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>opq550</ExternalSystemId>
        <enterprise:PartNumber>159753</enterprise:PartNumber>
        <enterprise:Supplier>25002</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
      <enterprise:IntegrationSupplierItem>
        <ExternalSystemId>ght342</ExternalSystemId>
        <enterprise:PartNumber>11303633</enterprise:PartNumber>
        <enterprise:Supplier>3800</enterprise:Supplier>
      </enterprise:IntegrationSupplierItem>
    </enterprise:SupplierItems>
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


the namespace make the output become:

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfIntegrationImportData xmlns="http://schemas.datacontract.org/2004/07/Elite.Core.Integrations" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations">
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value1</ExternalSystemId>
    <enterprise:Description>abcdefg</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems />
  </IntegrationImportData>
  <IntegrationImportData i:type="enterprise:IntegrationItem">
    <ExternalSystemId>value2</ExternalSystemId>
    <enterprise:Description>gfdsa</enterprise:Description>
    <enterprise:IsActive>1</enterprise:IsActive>
    <enterprise:SupplierItems />
  </IntegrationImportData>
</ArrayOfIntegrationImportData>

Open in new window


lacking all the  enterprise:SupplierItems/enterprise:IntegrationSupplierItem
0
Geert BormansInformation ArchitectCommented:
aha, you can't develop XSLTs based on poor input :-)

Just add the default namespace and bind the prefix to it

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:enterprise="http://schemas.datacontract.org/2004/07/Elite.Enterprise.Integrations"
xmlns:eci="http://schemas.datacontract.org/2004/07/Elite.Core.Integrations">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:strip-space elements="*"/>
	<xsl:key name="eis" match="enterprise:IntegrationSupplierItem" use="concat(generate-id(parent::*), '--', eci:ExternalSystemId)"/>
	<xsl:template match="node()">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="node()"/>
		</xsl:copy>
	</xsl:template>
	<xsl:template match="enterprise:SupplierItems">
		<xsl:copy>
			<xsl:copy-of select="@*"/>
			<xsl:apply-templates select="enterprise:IntegrationSupplierItem[generate-id() = generate-id(key('eis', concat(generate-id(parent::*), '--', eci:ExternalSystemId))[1])]"/>
		</xsl:copy>
	</xsl:template>
	
</xsl:stylesheet>

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Peter_W_SAuthor Commented:
Thanks Geert Bormans, it works like a charm.

And you are right, poor input => poor output - and that goes for most things in life :-)
0
Geert BormansInformation ArchitectCommented:
welcome.

If you had not encountered the prefix binding in XSLT for default namespace in source before,
that is one to remember
Usualy one would use the default namespace in XSLT for the output tree.
That implies you need a different soultion for the source default namespace in the XSLT
You will need this approach many many times when you deal with XSLT and namespaces
0
Peter_W_SAuthor Commented:
Thanks to Geert Bormans for comming back and follow up on later provided information, even if giving a perfect solution to wrongfully data given in first place.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
XML

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.