Solved

how to remove duplicate entries in an xml node using xslt

Posted on 2011-09-06
10
547 Views
Last Modified: 2012-05-12
I m a beginner in xslt and need to write xslt to convert xml to xml.
My requirement is

Content of <dc:subject>
Remove full stops at the end of entries
Remove duplicate entries where there is a full match across field
Separate entries with semicolon and a space

Example:
<dc:subject>Leg</dc:subject>
<dc:subject>Leg</dc:subject>
<dc:subject>Wound healing.</dc:subject>
<dc:subject>Lower Extremity</dc:subject>
<dc:subject>Problem-Based Learning.</dc:subject>
<dc:subject>Skin Diseases</dc:subject>
<dc:subject>Wound Healing</dc:subject>
<dc:subject>Wounds and Injuries</dc:subject>
<dc:subject>MEDICAL</dc:subject>
" Leg;Wound healing;Lower Extremity;Problem-Based Learning;Skin Diseases;Wounds and Injuries;MEDICAL"
0
Comment
Question by:mmalik15
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491503
Not sure if you need XSLT2 or XSLT1

XSLT2 is of course more flexible
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="2.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:for-each-group select="//dc:subject" group-by="replace(., '\.$', '')">
            <xsl:if test="position() ne 1">
                <xsl:text>;</xsl:text>
            </xsl:if>
            <xsl:value-of select="replace(., '\.$', '')"/>
        </xsl:for-each-group>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491525
A bit more obscure, but this is what I would use
<xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="string-join(distinct-values(//dc:subject/replace(., '\.$', '')), ';')"/>
        <xsl:text>"</xsl:text>
    </xsl:template>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491538
i need to use in xslt 1.0 and will try that now to see if it works
0
How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491566
And a naieve XSLT1 solution
If you have many thousands of terms, I would use muenchian grouping for getting the unique terms,
instead of walking the preceding axis all the time

please check the namespace URI, just guessing the dublin core version
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="1.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
         <xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]" >
            <xsl:if test="not(position() = 1)">
                <xsl:text>;</xsl:text>
            </xsl:if>
             <xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
             <xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
         </xsl:for-each>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491694
thanks Gerton for the comments
but im using <?xml version="1.0" encoding="UTF-8"?>
and I guess distinct-values works only with 2.0.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491789
well, everybody is using <?xml version="1.0"
but the stylesheet version is what is important.
My last post contains a XSLT1 solution

and yes, distinct-values is XSLT2 only
0
 

Author Comment

by:mmalik15
ID: 36491986
I have attached my complete xslt but something is not write with dc:subject
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:variable name="lower">abcdefghijklmnopqrstuvwxyz</xsl:variable>
	<xsl:variable name="upper">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
	<xsl:variable name="msubjectgenval"/>
	<xsl:template match="/">
		<records>
			<xsl:for-each select="/rdf:RDF/rdf:Description">
				<record>
					<body>
						<xsl:for-each select="dc:description">
							<xsl:value-of select="."/>
							<xsl:text> </xsl:text>
						</xsl:for-each>
					</body>
					<!-- END of BODY -->
					<languages>
						<xsl:choose>
							<xsl:when test="dc:language = 'eng'">English</xsl:when>
							<xsl:otherwise>
								<xsl:value-of select="dc:language"/>
							</xsl:otherwise>
						</xsl:choose>
					</languages>
					<!-- END of LANGUAGES -->
					<mauthentication>Athens log-in to access full citation and abstract</mauthentication>
					<!-- END of  mauthentication -->
					<mauthorpersons>
						<!-- Make sure the condition is an exact match! 
			     If it's not, invent more smart way to compare -->
						<xsl:for-each select="dc:creator[. != 'Wiley InterScience (Online service)']">
							<!-- I assume, it's ok to remove all the dot characters from the string
				 If it's wrong approach, need something more complex -->
							<xsl:value-of select="translate(., '.', '')"/>
							<xsl:if test="position() != last()">
								<xsl:text>; </xsl:text>
							</xsl:if>
						</xsl:for-each>
					</mauthorpersons>
					<mavailability>All e-Library Athens password holders</mavailability>
					<!-- END of mavailability -->
					<mdatepublished>
						<xsl:value-of select="dc:date"/>
					</mdatepublished>
					<!-- END of mdatepublished -->
					<mdbid>
						<xsl:value-of select="dc:identifier_dbid"/>
					</mdbid>
					<!-- END of mdbid -->
					<xsl:for-each select="dc:identifier_isbn">
						<misbn>
							<xsl:value-of select="."/>
						</misbn>
					</xsl:for-each>
					<mprovider>EBSCO NetLibrary</mprovider>
					<xsl:for-each select="dc:publicationplace">
						<mpublicationplace>
							<xsl:value-of select="."/>
						</mpublicationplace>
					</xsl:for-each>
					<xsl:for-each select="dc:publisher">
						<mpublisher>
							<xsl:value-of select="."/>
						</mpublisher>
					</xsl:for-each>
					<mrtype>Subscription e-books</mrtype>
					<!-- END of mrtype -->
					<msourcename>EBSCO NetLibrary</msourcename>
					<!-- END of msourcename -->
					<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>
					<title>
						<xsl:value-of select="dc:title"/>
					</title>
					<mtitlealt>
						<xsl:value-of select="dc:relation"/>
					</mtitlealt>
					<xsl:for-each select="dc:identifier_dbid">
						<url>
							<xsl:choose>
								<xsl:when test='contains(.,"ebscohost")'>
									<xsl:value-of select="."/>
								</xsl:when>
							</xsl:choose>
						</url>
					</xsl:for-each>
				</record>
				<!-- END of RECORD-->
			</xsl:for-each>
		</records>
		<!-- END of RECORDS-->
	</xsl:template>
	<xsl:template name="tempmsubjectgen">
		<xsl:param name="currvalue"/>
		<msubjectgen>
			<xsl:value-of select="$currvalue"/>
		</msubjectgen>
	</xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 36492046
you are looping inside an allready existing loop
<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>

needs to become

				<xsl:text>"</xsl:text>
					<xsl:for-each select="dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
					</xsl:for-each>
				<xsl:text>"</xsl:text>

at least if you have the context right
(I can't know, I don't see the source XML)

Have you seen how I deal with removing the ending "."
Beats translating all "."

Open in new window

0
 

Author Closing Comment

by:mmalik15
ID: 36492069
Awesome buddy!
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36492074
welcome
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Telerik RadEditor Control Save 8 16
CSS question 4 28
Check input text, Number 7 37
title attribute 5 27
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
In this tutorial viewers will learn how to embed an audio file in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: : The declaration should display (CODE) HTML5 is supported by the most recent versions of all major browsers…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question