Solved

how to remove duplicate entries in an xml node using xslt

Posted on 2011-09-06
10
550 Views
Last Modified: 2012-05-12
I m a beginner in xslt and need to write xslt to convert xml to xml.
My requirement is

Content of <dc:subject>
Remove full stops at the end of entries
Remove duplicate entries where there is a full match across field
Separate entries with semicolon and a space

Example:
<dc:subject>Leg</dc:subject>
<dc:subject>Leg</dc:subject>
<dc:subject>Wound healing.</dc:subject>
<dc:subject>Lower Extremity</dc:subject>
<dc:subject>Problem-Based Learning.</dc:subject>
<dc:subject>Skin Diseases</dc:subject>
<dc:subject>Wound Healing</dc:subject>
<dc:subject>Wounds and Injuries</dc:subject>
<dc:subject>MEDICAL</dc:subject>
" Leg;Wound healing;Lower Extremity;Problem-Based Learning;Skin Diseases;Wounds and Injuries;MEDICAL"
0
Comment
Question by:mmalik15
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
10 Comments
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491503
Not sure if you need XSLT2 or XSLT1

XSLT2 is of course more flexible
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="2.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:for-each-group select="//dc:subject" group-by="replace(., '\.$', '')">
            <xsl:if test="position() ne 1">
                <xsl:text>;</xsl:text>
            </xsl:if>
            <xsl:value-of select="replace(., '\.$', '')"/>
        </xsl:for-each-group>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491525
A bit more obscure, but this is what I would use
<xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="string-join(distinct-values(//dc:subject/replace(., '\.$', '')), ';')"/>
        <xsl:text>"</xsl:text>
    </xsl:template>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491538
i need to use in xslt 1.0 and will try that now to see if it works
0
Is Your Team Achieving Their Full Potential?

74% of employees feel they are not achieving their full potential. With Linux Academy, not only will you strengthen your team's core competencies but also their knowledge of of the newest IT topics.

With new material every week, we'll make sure that you stay ahead of the game.

 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491566
And a naieve XSLT1 solution
If you have many thousands of terms, I would use muenchian grouping for getting the unique terms,
instead of walking the preceding axis all the time

please check the namespace URI, just guessing the dublin core version
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="1.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
         <xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]" >
            <xsl:if test="not(position() = 1)">
                <xsl:text>;</xsl:text>
            </xsl:if>
             <xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
             <xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
         </xsl:for-each>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 

Author Comment

by:mmalik15
ID: 36491694
thanks Gerton for the comments
but im using <?xml version="1.0" encoding="UTF-8"?>
and I guess distinct-values works only with 2.0.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36491789
well, everybody is using <?xml version="1.0"
but the stylesheet version is what is important.
My last post contains a XSLT1 solution

and yes, distinct-values is XSLT2 only
0
 

Author Comment

by:mmalik15
ID: 36491986
I have attached my complete xslt but something is not write with dc:subject
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:variable name="lower">abcdefghijklmnopqrstuvwxyz</xsl:variable>
	<xsl:variable name="upper">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
	<xsl:variable name="msubjectgenval"/>
	<xsl:template match="/">
		<records>
			<xsl:for-each select="/rdf:RDF/rdf:Description">
				<record>
					<body>
						<xsl:for-each select="dc:description">
							<xsl:value-of select="."/>
							<xsl:text> </xsl:text>
						</xsl:for-each>
					</body>
					<!-- END of BODY -->
					<languages>
						<xsl:choose>
							<xsl:when test="dc:language = 'eng'">English</xsl:when>
							<xsl:otherwise>
								<xsl:value-of select="dc:language"/>
							</xsl:otherwise>
						</xsl:choose>
					</languages>
					<!-- END of LANGUAGES -->
					<mauthentication>Athens log-in to access full citation and abstract</mauthentication>
					<!-- END of  mauthentication -->
					<mauthorpersons>
						<!-- Make sure the condition is an exact match! 
			     If it's not, invent more smart way to compare -->
						<xsl:for-each select="dc:creator[. != 'Wiley InterScience (Online service)']">
							<!-- I assume, it's ok to remove all the dot characters from the string
				 If it's wrong approach, need something more complex -->
							<xsl:value-of select="translate(., '.', '')"/>
							<xsl:if test="position() != last()">
								<xsl:text>; </xsl:text>
							</xsl:if>
						</xsl:for-each>
					</mauthorpersons>
					<mavailability>All e-Library Athens password holders</mavailability>
					<!-- END of mavailability -->
					<mdatepublished>
						<xsl:value-of select="dc:date"/>
					</mdatepublished>
					<!-- END of mdatepublished -->
					<mdbid>
						<xsl:value-of select="dc:identifier_dbid"/>
					</mdbid>
					<!-- END of mdbid -->
					<xsl:for-each select="dc:identifier_isbn">
						<misbn>
							<xsl:value-of select="."/>
						</misbn>
					</xsl:for-each>
					<mprovider>EBSCO NetLibrary</mprovider>
					<xsl:for-each select="dc:publicationplace">
						<mpublicationplace>
							<xsl:value-of select="."/>
						</mpublicationplace>
					</xsl:for-each>
					<xsl:for-each select="dc:publisher">
						<mpublisher>
							<xsl:value-of select="."/>
						</mpublisher>
					</xsl:for-each>
					<mrtype>Subscription e-books</mrtype>
					<!-- END of mrtype -->
					<msourcename>EBSCO NetLibrary</msourcename>
					<!-- END of msourcename -->
					<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>
					<title>
						<xsl:value-of select="dc:title"/>
					</title>
					<mtitlealt>
						<xsl:value-of select="dc:relation"/>
					</mtitlealt>
					<xsl:for-each select="dc:identifier_dbid">
						<url>
							<xsl:choose>
								<xsl:when test='contains(.,"ebscohost")'>
									<xsl:value-of select="."/>
								</xsl:when>
							</xsl:choose>
						</url>
					</xsl:for-each>
				</record>
				<!-- END of RECORD-->
			</xsl:for-each>
		</records>
		<!-- END of RECORDS-->
	</xsl:template>
	<xsl:template name="tempmsubjectgen">
		<xsl:param name="currvalue"/>
		<msubjectgen>
			<xsl:value-of select="$currvalue"/>
		</msubjectgen>
	</xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 36492046
you are looping inside an allready existing loop
<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>

needs to become

				<xsl:text>"</xsl:text>
					<xsl:for-each select="dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
					</xsl:for-each>
				<xsl:text>"</xsl:text>

at least if you have the context right
(I can't know, I don't see the source XML)

Have you seen how I deal with removing the ending "."
Beats translating all "."

Open in new window

0
 

Author Closing Comment

by:mmalik15
ID: 36492069
Awesome buddy!
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 36492074
welcome
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
In this tutorial viewers will learn how to position overlapping items using z-index in CSS. They will also learn the restrictions on the z-index property.  Create a new HTML document with an internal stylesheet.: Create a div in CSS and name it Red.…
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question