Solved

how to remove duplicate entries in an xml node using xslt

Posted on 2011-09-06
10
543 Views
Last Modified: 2012-05-12
I m a beginner in xslt and need to write xslt to convert xml to xml.
My requirement is

Content of <dc:subject>
Remove full stops at the end of entries
Remove duplicate entries where there is a full match across field
Separate entries with semicolon and a space

Example:
<dc:subject>Leg</dc:subject>
<dc:subject>Leg</dc:subject>
<dc:subject>Wound healing.</dc:subject>
<dc:subject>Lower Extremity</dc:subject>
<dc:subject>Problem-Based Learning.</dc:subject>
<dc:subject>Skin Diseases</dc:subject>
<dc:subject>Wound Healing</dc:subject>
<dc:subject>Wounds and Injuries</dc:subject>
<dc:subject>MEDICAL</dc:subject>
" Leg;Wound healing;Lower Extremity;Problem-Based Learning;Skin Diseases;Wounds and Injuries;MEDICAL"
0
Comment
Question by:mmalik15
  • 6
  • 4
10 Comments
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491503
Not sure if you need XSLT2 or XSLT1

XSLT2 is of course more flexible
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="2.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:for-each-group select="//dc:subject" group-by="replace(., '\.$', '')">
            <xsl:if test="position() ne 1">
                <xsl:text>;</xsl:text>
            </xsl:if>
            <xsl:value-of select="replace(., '\.$', '')"/>
        </xsl:for-each-group>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491525
A bit more obscure, but this is what I would use
<xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="string-join(distinct-values(//dc:subject/replace(., '\.$', '')), ';')"/>
        <xsl:text>"</xsl:text>
    </xsl:template>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491538
i need to use in xslt 1.0 and will try that now to see if it works
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491566
And a naieve XSLT1 solution
If you have many thousands of terms, I would use muenchian grouping for getting the unique terms,
instead of walking the preceding axis all the time

please check the namespace URI, just guessing the dublin core version
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="1.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
         <xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]" >
            <xsl:if test="not(position() = 1)">
                <xsl:text>;</xsl:text>
            </xsl:if>
             <xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
             <xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
         </xsl:for-each>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491694
thanks Gerton for the comments
but im using <?xml version="1.0" encoding="UTF-8"?>
and I guess distinct-values works only with 2.0.
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491789
well, everybody is using <?xml version="1.0"
but the stylesheet version is what is important.
My last post contains a XSLT1 solution

and yes, distinct-values is XSLT2 only
0
 

Author Comment

by:mmalik15
ID: 36491986
I have attached my complete xslt but something is not write with dc:subject
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:variable name="lower">abcdefghijklmnopqrstuvwxyz</xsl:variable>
	<xsl:variable name="upper">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
	<xsl:variable name="msubjectgenval"/>
	<xsl:template match="/">
		<records>
			<xsl:for-each select="/rdf:RDF/rdf:Description">
				<record>
					<body>
						<xsl:for-each select="dc:description">
							<xsl:value-of select="."/>
							<xsl:text> </xsl:text>
						</xsl:for-each>
					</body>
					<!-- END of BODY -->
					<languages>
						<xsl:choose>
							<xsl:when test="dc:language = 'eng'">English</xsl:when>
							<xsl:otherwise>
								<xsl:value-of select="dc:language"/>
							</xsl:otherwise>
						</xsl:choose>
					</languages>
					<!-- END of LANGUAGES -->
					<mauthentication>Athens log-in to access full citation and abstract</mauthentication>
					<!-- END of  mauthentication -->
					<mauthorpersons>
						<!-- Make sure the condition is an exact match! 
			     If it's not, invent more smart way to compare -->
						<xsl:for-each select="dc:creator[. != 'Wiley InterScience (Online service)']">
							<!-- I assume, it's ok to remove all the dot characters from the string
				 If it's wrong approach, need something more complex -->
							<xsl:value-of select="translate(., '.', '')"/>
							<xsl:if test="position() != last()">
								<xsl:text>; </xsl:text>
							</xsl:if>
						</xsl:for-each>
					</mauthorpersons>
					<mavailability>All e-Library Athens password holders</mavailability>
					<!-- END of mavailability -->
					<mdatepublished>
						<xsl:value-of select="dc:date"/>
					</mdatepublished>
					<!-- END of mdatepublished -->
					<mdbid>
						<xsl:value-of select="dc:identifier_dbid"/>
					</mdbid>
					<!-- END of mdbid -->
					<xsl:for-each select="dc:identifier_isbn">
						<misbn>
							<xsl:value-of select="."/>
						</misbn>
					</xsl:for-each>
					<mprovider>EBSCO NetLibrary</mprovider>
					<xsl:for-each select="dc:publicationplace">
						<mpublicationplace>
							<xsl:value-of select="."/>
						</mpublicationplace>
					</xsl:for-each>
					<xsl:for-each select="dc:publisher">
						<mpublisher>
							<xsl:value-of select="."/>
						</mpublisher>
					</xsl:for-each>
					<mrtype>Subscription e-books</mrtype>
					<!-- END of mrtype -->
					<msourcename>EBSCO NetLibrary</msourcename>
					<!-- END of msourcename -->
					<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>
					<title>
						<xsl:value-of select="dc:title"/>
					</title>
					<mtitlealt>
						<xsl:value-of select="dc:relation"/>
					</mtitlealt>
					<xsl:for-each select="dc:identifier_dbid">
						<url>
							<xsl:choose>
								<xsl:when test='contains(.,"ebscohost")'>
									<xsl:value-of select="."/>
								</xsl:when>
							</xsl:choose>
						</url>
					</xsl:for-each>
				</record>
				<!-- END of RECORD-->
			</xsl:for-each>
		</records>
		<!-- END of RECORDS-->
	</xsl:template>
	<xsl:template name="tempmsubjectgen">
		<xsl:param name="currvalue"/>
		<msubjectgen>
			<xsl:value-of select="$currvalue"/>
		</msubjectgen>
	</xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 36492046
you are looping inside an allready existing loop
<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>

needs to become

				<xsl:text>"</xsl:text>
					<xsl:for-each select="dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
					</xsl:for-each>
				<xsl:text>"</xsl:text>

at least if you have the context right
(I can't know, I don't see the source XML)

Have you seen how I deal with removing the ending "."
Beats translating all "."

Open in new window

0
 

Author Closing Comment

by:mmalik15
ID: 36492069
Awesome buddy!
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36492074
welcome
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Angular 2 - Issue in page Layout 2 48
.php tree directory? 5 57
Wrapper for APPs 9 40
Python - desktop use 1 10
This article discusses four methods for overlaying images in a container on a web page
This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
In this tutorial viewers will learn how to code links for mobile sites that, once clicked, send a call or text to a specified number. For a telephone link (once clicked, calls a number), begin with a normal "<a href=" link tag. For the href, specify…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now