• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 559
  • Last Modified:

how to remove duplicate entries in an xml node using xslt

I m a beginner in xslt and need to write xslt to convert xml to xml.
My requirement is

Content of <dc:subject>
Remove full stops at the end of entries
Remove duplicate entries where there is a full match across field
Separate entries with semicolon and a space

Example:
<dc:subject>Leg</dc:subject>
<dc:subject>Leg</dc:subject>
<dc:subject>Wound healing.</dc:subject>
<dc:subject>Lower Extremity</dc:subject>
<dc:subject>Problem-Based Learning.</dc:subject>
<dc:subject>Skin Diseases</dc:subject>
<dc:subject>Wound Healing</dc:subject>
<dc:subject>Wounds and Injuries</dc:subject>
<dc:subject>MEDICAL</dc:subject>
" Leg;Wound healing;Lower Extremity;Problem-Based Learning;Skin Diseases;Wounds and Injuries;MEDICAL"
0
mmalik15
Asked:
mmalik15
  • 6
  • 4
1 Solution
 
Geert BormansInformation ArchitectCommented:
Not sure if you need XSLT2 or XSLT1

XSLT2 is of course more flexible
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="2.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:for-each-group select="//dc:subject" group-by="replace(., '\.$', '')">
            <xsl:if test="position() ne 1">
                <xsl:text>;</xsl:text>
            </xsl:if>
            <xsl:value-of select="replace(., '\.$', '')"/>
        </xsl:for-each-group>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
Geert BormansInformation ArchitectCommented:
A bit more obscure, but this is what I would use
<xsl:template match="/">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="string-join(distinct-values(//dc:subject/replace(., '\.$', '')), ';')"/>
        <xsl:text>"</xsl:text>
    </xsl:template>

Open in new window

0
 
mmalik15Author Commented:
i need to use in xslt 1.0 and will try that now to see if it works
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
Geert BormansInformation ArchitectCommented:
And a naieve XSLT1 solution
If you have many thousands of terms, I would use muenchian grouping for getting the unique terms,
instead of walking the preceding axis all the time

please check the namespace URI, just guessing the dublin core version
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    version="1.0">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <xsl:text>"</xsl:text>
         <xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]" >
            <xsl:if test="not(position() = 1)">
                <xsl:text>;</xsl:text>
            </xsl:if>
             <xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
             <xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
         </xsl:for-each>
        <xsl:text>"</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
mmalik15Author Commented:
thanks Gerton for the comments
but im using <?xml version="1.0" encoding="UTF-8"?>
and I guess distinct-values works only with 2.0.
0
 
Geert BormansInformation ArchitectCommented:
well, everybody is using <?xml version="1.0"
but the stylesheet version is what is important.
My last post contains a XSLT1 solution

and yes, distinct-values is XSLT2 only
0
 
mmalik15Author Commented:
I have attached my complete xslt but something is not write with dc:subject
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:variable name="lower">abcdefghijklmnopqrstuvwxyz</xsl:variable>
	<xsl:variable name="upper">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
	<xsl:variable name="msubjectgenval"/>
	<xsl:template match="/">
		<records>
			<xsl:for-each select="/rdf:RDF/rdf:Description">
				<record>
					<body>
						<xsl:for-each select="dc:description">
							<xsl:value-of select="."/>
							<xsl:text> </xsl:text>
						</xsl:for-each>
					</body>
					<!-- END of BODY -->
					<languages>
						<xsl:choose>
							<xsl:when test="dc:language = 'eng'">English</xsl:when>
							<xsl:otherwise>
								<xsl:value-of select="dc:language"/>
							</xsl:otherwise>
						</xsl:choose>
					</languages>
					<!-- END of LANGUAGES -->
					<mauthentication>Athens log-in to access full citation and abstract</mauthentication>
					<!-- END of  mauthentication -->
					<mauthorpersons>
						<!-- Make sure the condition is an exact match! 
			     If it's not, invent more smart way to compare -->
						<xsl:for-each select="dc:creator[. != 'Wiley InterScience (Online service)']">
							<!-- I assume, it's ok to remove all the dot characters from the string
				 If it's wrong approach, need something more complex -->
							<xsl:value-of select="translate(., '.', '')"/>
							<xsl:if test="position() != last()">
								<xsl:text>; </xsl:text>
							</xsl:if>
						</xsl:for-each>
					</mauthorpersons>
					<mavailability>All e-Library Athens password holders</mavailability>
					<!-- END of mavailability -->
					<mdatepublished>
						<xsl:value-of select="dc:date"/>
					</mdatepublished>
					<!-- END of mdatepublished -->
					<mdbid>
						<xsl:value-of select="dc:identifier_dbid"/>
					</mdbid>
					<!-- END of mdbid -->
					<xsl:for-each select="dc:identifier_isbn">
						<misbn>
							<xsl:value-of select="."/>
						</misbn>
					</xsl:for-each>
					<mprovider>EBSCO NetLibrary</mprovider>
					<xsl:for-each select="dc:publicationplace">
						<mpublicationplace>
							<xsl:value-of select="."/>
						</mpublicationplace>
					</xsl:for-each>
					<xsl:for-each select="dc:publisher">
						<mpublisher>
							<xsl:value-of select="."/>
						</mpublisher>
					</xsl:for-each>
					<mrtype>Subscription e-books</mrtype>
					<!-- END of mrtype -->
					<msourcename>EBSCO NetLibrary</msourcename>
					<!-- END of msourcename -->
					<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>
					<title>
						<xsl:value-of select="dc:title"/>
					</title>
					<mtitlealt>
						<xsl:value-of select="dc:relation"/>
					</mtitlealt>
					<xsl:for-each select="dc:identifier_dbid">
						<url>
							<xsl:choose>
								<xsl:when test='contains(.,"ebscohost")'>
									<xsl:value-of select="."/>
								</xsl:when>
							</xsl:choose>
						</url>
					</xsl:for-each>
				</record>
				<!-- END of RECORD-->
			</xsl:for-each>
		</records>
		<!-- END of RECORDS-->
	</xsl:template>
	<xsl:template name="tempmsubjectgen">
		<xsl:param name="currvalue"/>
		<msubjectgen>
			<xsl:value-of select="$currvalue"/>
		</msubjectgen>
	</xsl:template>
</xsl:stylesheet>

Open in new window

0
 
Geert BormansInformation ArchitectCommented:
you are looping inside an allready existing loop
<xsl:for-each select="dc:subject">
						<xsl:text>"</xsl:text>
						<xsl:for-each select="//dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
						</xsl:for-each>
						<xsl:text>"</xsl:text>
					</xsl:for-each>

needs to become

				<xsl:text>"</xsl:text>
					<xsl:for-each select="dc:subject[not(. = preceding-sibling::dc:subject)]">
							<xsl:if test="not(position() = 1)">
								<xsl:text>;</xsl:text>
							</xsl:if>
							<xsl:value-of select="substring(., 1, string-length(.) - 1)"/>
							<xsl:value-of select="translate(substring(., string-length(.)), '.', '')"/>
					</xsl:for-each>
				<xsl:text>"</xsl:text>

at least if you have the context right
(I can't know, I don't see the source XML)

Have you seen how I deal with removing the ending "."
Beats translating all "."

Open in new window

0
 
mmalik15Author Commented:
Awesome buddy!
0
 
Geert BormansInformation ArchitectCommented:
welcome
0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now