Solved

Muenchian method - Unique Values

Posted on 2006-05-30
299 Views
I am aware that there is a technique called the Muenchian method  for selecting unique values from an unsorted source document - eg select the unique InvoiceID values from:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="G:\My Documents\Xml Projects\People\UnSortedInvoiceLines.xslt"?>
<InvoiceLineItems>
<LineItem InvoiceID="0000002">
<ItemID>DDE111</ItemID>
<Quantity>2</Quantity>
</LineItem>
<LineItem InvoiceID="0000001">
<ItemID>BCC002</ItemID>
<Quantity>12</Quantity>
</LineItem>
<LineItem InvoiceID="0000003">
<ItemID>BBB002</ItemID>
<Quantity>5</Quantity>
</LineItem>
<LineItem InvoiceID="0000002">
<ItemID>CCD344</ItemID>
<Quantity>1</Quantity>
</LineItem>
<LineItem InvoiceID="0000001">
<ItemID>AAA001</ItemID>
<Quantity>23</Quantity>
</LineItem>
<LineItem InvoiceID="0000002">
<ItemID>AAA003</ItemID>
<Quantity>4</Quantity>
</LineItem>
</InvoiceLineItems>

Can someone show me how to do this and explain how it actually works?

Thanks

0
Question by:daveamour

LVL 60

Expert Comment

Hi daveamour,

Muenchian method is more about sorting and grouping
here is a good explanation
http://www.jenitennison.com/xslt/grouping/muenchian.xml

Muench uses keys and generate-id to get the unique values
but there is a more lightway approach as well

I will show you one that could be easily understood in your example
and show the Muenchian in the next post

this example checks for every LineItem in the for-each, wheither an earlier LineItem had the same @InvoiceID

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="InvoiceLineItems">
<xsl:for-each select="LineItem[not(@InvoiceID = preceding-sibling::LineItem/@InvoiceID)]">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Cheers!
0

LVL 60

Expert Comment

daveamour,

and here is the key version, as used by Muench

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:key name="LineItem-by-ID" match="LineItem" use="@InvoiceID" />
<xsl:template match="InvoiceLineItems">
<xsl:for-each select="LineItem[generate-id() = generate-id(key('LineItem-by-ID', @InvoiceID)[1])]">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

0

LVL 60

Accepted Solution

daveamour,

first you create a key
<xsl:key name="LineItem-by-ID" match="LineItem" use="@InvoiceID" />

this key enables you to select a LineItem(through the "match" attribute), based on the @InvoiceID (through the "use" attribute)
(it is a bit like an index in a database)

if you now generate an id of a LineItem node (the generated id of a node is unpredictable, but in one process it would be the same every time you calculate it)
and you generate the id of the first LineItem node having the same @InvoiceID (using the key)
than you can compare generated ids... if they are the same, you know it is the same node... or the first distinct LineItem with this particular @InvoiceID

that is what happens here
select="LineItem[generate-id() = generate-id(key('LineItem-by-ID', @InvoiceID)[1])]"

hope this helps

cheers
0

LVL 19

Author Comment

Ok thanks Gertone.  I have a few questions if thats ok?  With your first example I had tried something simillar but with sorting and checking against the preceding-sibling but this didn't work.

Anyway I got the impression that preceding-sibling::LineItem/@InvoiceID in this case returns all preceding siblings.  In this case then the = operator is not behaving as one would normally expect.  Rather than a 1 to 1 equality test it seems to be working as a 1 to many equality check.  Does this make sense and is this correct?

Thanks

Dave
0

LVL 19

Author Comment

Another thing:

You use:

<xsl:for-each select="LineItem[not(@InvoiceID = preceding-sibling::LineItem/@InvoiceID)]">

Earlier I tried:

<xsl:for-each select="LineItem[@InvoiceID != preceding-sibling::LineItem/@InvoiceID]">

which doesn't seem to mean the same thing.  How does this work?

Cheers

Dave
0

LVL 60

Expert Comment

> preceding-sibling::LineItem/@InvoiceID in this case returns all preceding siblings
correct, it returns a node-set of all the nodes that are preceding siblings named LineItem

> it seems to be working as a 1 to many equality check
Actually it returns all the nodes for which this condition applies... so yes it is a 1 to many equality check

> select="LineItem[not(@InvoiceID = preceding-sibling::LineItem/@InvoiceID)]">
selects the LineItem elements that have an @InvoiceID that is not equal to any of the preceding sibling attributes @InvoiceID

> select="LineItem[@InvoiceID != preceding-sibling::LineItem/@InvoiceID]">
select the LineItem elements that have an @InvoiceID that is not equal to at least one of the preceding sibling attributes @InvoiceID
0

LVL 19

Author Comment

Ok thanks, this is a lot to take in!

Thanks very much - I shall go away and study this.

Cheers

Dave

0

LVL 60

Expert Comment

welcome
0

Featured Post

Suggested Solutions

The Problem How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end. The situation expressed as relational data Let’s work through this.  I’ve …
The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…