Solved

how to remove duplicate entries in an xml node using xslt

Posted on 2011-09-06
10
551 Views
Last Modified: 2012-05-12
I m a beginner in xslt and need to write xslt to convert xml to xml.
My requirement is

Content of <dc:subject>
Remove full stops at the end of entries
Remove duplicate entries where there is a full match across field
Separate entries with semicolon and a space

Example:
<dc:subject>Leg</dc:subject>
<dc:subject>Leg</dc:subject>
<dc:subject>Wound healing.</dc:subject>
<dc:subject>Lower Extremity</dc:subject>
<dc:subject>Problem-Based Learning.</dc:subject>
<dc:subject>Skin Diseases</dc:subject>
<dc:subject>Wound Healing</dc:subject>
<dc:subject>Wounds and Injuries</dc:subject>
<dc:subject>MEDICAL</dc:subject>
" Leg;Wound healing;Lower Extremity;Problem-Based Learning;Skin Diseases;Wounds and Injuries;MEDICAL"
0
Comment
Question by:mmalik15
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491503
Not sure if you need XSLT2 or XSLT1

XSLT2 is of course more flexible
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="2.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:for-each-group select="//dc:subject" group-by="replace(., '\.$', '')">
            <xsl:if test="position() ne 1">
                <xsl:text>;</xsl:text>
            </xsl:if>
            <xsl:value-of select="replace(., '\.$', '')"/>
        </xsl:for-each-group>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491525
A bit more obscure, but this is what I would use
<xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="string-join(distinct-values(//dc:subject/replace(., '\.$', '')), ';')"/>
        <xsl:text>"</xsl:text>
    </xsl:template>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491538
i need to use in xslt 1.0 and will try that now to see if it works
0
WordPress Tutorial 2: Terminology

An important part of learning any new piece of software is understanding the terminology it uses. Thankfully WordPress uses fairly simple names for everything that make it easy to start using the software.

 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491566
And a naieve XSLT1 solution
If you have many thousands of terms, I would use muenchian grouping for getting the unique terms,
instead of walking the preceding axis all the time

please check the namespace URI, just guessing the dublin core version
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="1.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
         <xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]" >
            <xsl:if test="not(position() = 1)">
                <xsl:text>;</xsl:text>
            </xsl:if>
             <xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
             <xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
         </xsl:for-each>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491694
thanks Gerton for the comments
but im using <?xml version="1.0" encoding="UTF-8"?>
and I guess distinct-values works only with 2.0.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491789
well, everybody is using <?xml version="1.0"
but the stylesheet version is what is important.
My last post contains a XSLT1 solution

and yes, distinct-values is XSLT2 only
0
 

Author Comment

by:mmalik15
ID: 36491986
I have attached my complete xslt but something is not write with dc:subject
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:variable name="lower">abcdefghijklmnopqrstuvwxyz</xsl:variable>
	<xsl:variable name="upper">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
	<xsl:variable name="msubjectgenval"/>
	<xsl:template match="/">
		<records>
			<xsl:for-each select="/rdf:RDF/rdf:Description">
				<record>
					<body>
						<xsl:for-each select="dc:description">
							<xsl:value-of select="."/>
							<xsl:text> </xsl:text>
						</xsl:for-each>
					</body>
					<!-- END of BODY -->
					<languages>
						<xsl:choose>
							<xsl:when test="dc:language = 'eng'">English</xsl:when>
							<xsl:otherwise>
								<xsl:value-of select="dc:language"/>
							</xsl:otherwise>
						</xsl:choose>
					</languages>
					<!-- END of LANGUAGES -->
					<mauthentication>Athens log-in to access full citation and abstract</mauthentication>
					<!-- END of  mauthentication -->
					<mauthorpersons>
						<!-- Make sure the condition is an exact match! 
			     If it's not, invent more smart way to compare -->
						<xsl:for-each select="dc:creator[. != 'Wiley InterScience (Online service)']">
							<!-- I assume, it's ok to remove all the dot characters from the string
				 If it's wrong approach, need something more complex -->
							<xsl:value-of select="translate(., '.', '')"/>
							<xsl:if test="position() != last()">
								<xsl:text>; </xsl:text>
							</xsl:if>
						</xsl:for-each>
					</mauthorpersons>
					<mavailability>All e-Library Athens password holders</mavailability>
					<!-- END of mavailability -->
					<mdatepublished>
						<xsl:value-of select="dc:date"/>
					</mdatepublished>
					<!-- END of mdatepublished -->
					<mdbid>
						<xsl:value-of select="dc:identifier_dbid"/>
					</mdbid>
					<!-- END of mdbid -->
					<xsl:for-each select="dc:identifier_isbn">
						<misbn>
							<xsl:value-of select="."/>
						</misbn>
					</xsl:for-each>
					<mprovider>EBSCO NetLibrary</mprovider>
					<xsl:for-each select="dc:publicationplace">
						<mpublicationplace>
							<xsl:value-of select="."/>
						</mpublicationplace>
					</xsl:for-each>
					<xsl:for-each select="dc:publisher">
						<mpublisher>
							<xsl:value-of select="."/>
						</mpublisher>
					</xsl:for-each>
					<mrtype>Subscription e-books</mrtype>
					<!-- END of mrtype -->
					<msourcename>EBSCO NetLibrary</msourcename>
					<!-- END of msourcename -->
					<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>
					<title>
						<xsl:value-of select="dc:title"/>
					</title>
					<mtitlealt>
						<xsl:value-of select="dc:relation"/>
					</mtitlealt>
					<xsl:for-each select="dc:identifier_dbid">
						<url>
							<xsl:choose>
								<xsl:when test='contains(.,"ebscohost")'>
									<xsl:value-of select="."/>
								</xsl:when>
							</xsl:choose>
						</url>
					</xsl:for-each>
				</record>
				<!-- END of RECORD-->
			</xsl:for-each>
		</records>
		<!-- END of RECORDS-->
	</xsl:template>
	<xsl:template name="tempmsubjectgen">
		<xsl:param name="currvalue"/>
		<msubjectgen>
			<xsl:value-of select="$currvalue"/>
		</msubjectgen>
	</xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 36492046
you are looping inside an allready existing loop
<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>

needs to become

				<xsl:text>"</xsl:text>
					<xsl:for-each select="dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
					</xsl:for-each>
				<xsl:text>"</xsl:text>

at least if you have the context right
(I can't know, I don't see the source XML)

Have you seen how I deal with removing the ending "."
Beats translating all "."

Open in new window

0
 

Author Closing Comment

by:mmalik15
ID: 36492069
Awesome buddy!
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36492074
welcome
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to prepare an HTML email signature template file containing dynamic placeholders for users' Azure AD data. Furthermore, it explains how to use this file to remotely set up a department-wide email signature policy in Office …
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

626 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question