How to parse string using xpath only?

Posted on 2009-02-12
Medium Priority
Last Modified: 2013-11-11
I have a file name that I need to parse and get the extenstion out. So the xpath I had was working.. suppose I have a file name abcd.txt I was using this
(substring-after(normalize-space(/Data/parseResults/fileName/text()), '.'))
IT worked great until now .. the filenames are like

So the above xpath gives me the extension as efg.hij.txt which is ofcourse not correct..  Is there any way the get the right extension using xpath only?
Question by:aliasim99
  • 3
  • 3
  • 3
LVL 60

Expert Comment

by:Geert Bormans
ID: 23630979
Well, since you put it in the XQuery zone as well, I assume you can use XPath 2.0

tokenize(/Data/parseResults/fileName, '\.')[last()]

If you are not using XSLT2, you have some options
- in XSLT you could use recursion
- or you could hope that the extensions are predictable

So, I need to know some things.
What is the context? Is this in an XQuery, in an XSLT1, or in some DOM processing?
How predictable are the extensions, are they all three character extensions, is there a limited set of possibilities?
- If XQuery or XSLT2, use the above suggested XPath
- If extensions are all three characters, you could use this
         substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 2)
- If not predictable extensions and XSLT1, use recursion in XSLT1
- If none of the above, you could try something like this
         substring-after(substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 5), '.')
but then you need to hope that no extension is longer than 5 characters and no combination '.hij.txt' is smaller than 5 characters
The last one is a bit unsafe in my mind

Author Comment

ID: 23633724
Below is my XML and yes I thought of recursive and so far that's the best way to do it and I agree it's unsafe. The max number of dots I've seen in a file name here in my client's enviornment is 6 and below is what I'm planning to use


I will make it for upto 10 dots to be on the safe side.  It works perfectly.
It's not XPATH 2.0. Let me know if you can think of a better way.

<?xml version="1.0" encoding="UTF-8"?>

Open in new window

LVL 60

Expert Comment

by:Geert Bormans
ID: 23637563
Does that work?
I don't think it will.

Do you have an idea on what extensions are possible?
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Accepted Solution

JeffHand earned 2000 total points
ID: 23875861
Named template recursion should do the trick and not run into the same types. You create a template that takes a string parameter, use the substring-after to get the string after the first period, then call the template again (passing the new value) until you get a string that has no period. I believe I actually saw such a solution elsewhere on EE, but found the attached snippet elsewhere.

Make the initial call to the template with something like this:
    <xsl:call-template name="getExtension">
        <xsl:with-param name="filename" select="'abcd.efg.hij.txt'"/>

<xsl:template name="getExtension">
<xsl:param name="filename"/>
    <xsl:when test="contains($filename, '.')">
    <xsl:call-template name="getExtension">
      <xsl:with-param name="filename" select="substring-after($filename, '.')"/>
      <xsl:value-of select="$filename"/>

Open in new window


Expert Comment

ID: 23875875
My first line should have read: "Named template recursion should do the trick and not run into the same types of issues." This was a reference to the fact that you don't have to choose an arbitrary number of periods to look for or worry about the size or predictability of the extensions.

Author Comment

ID: 23879377
I solved this problem a while ago and I had to do the same thing run a loop over the string and check if there are any more dots. I was using BPML so I was able to do it. there is not straight out of the box solution for this so what you suggested is pretty much what I did just in a different language. Thanks for your response.
LVL 60

Expert Comment

by:Geert Bormans
ID: 23879858
Please be fair when grading
The original question said
"How to parse string using xpath only?"
I never got a response to my last question,
and now you accept an answer that is doing this "outside XPath" only
This does not reward the effort I did to help you get an XPath only solution as you require

Expert Comment

ID: 23915695
I'm not entirely sure what the proper protocol is, but did want to concur with Gertone on fairness when grading. Experts Exchange relies on the reward system to encourage participation.

I admit, I didn't take the "xpath only" restriction literally. Given Gertone's track record on EE - and in particular in the XSLT zone - I'm sure that without such a restriction Gertone would have had this question answered before I saw it. I'm also sure that a day will come when I have a sticky problem and hope Gertone and other experts are around to help.

Author Comment

ID: 23919736
Yeah.. I apologize for that. I should have been more careful. I had to go outside of xpath because I'm still using XML 1.0 and could not find a solution. The best solution was from Gertone using this
substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 2)
But I cant say that extension will always be 3 characters long. I just used a sample file name here the actual ones are 25 to 35 character long with 6-7 dots. So using the loop was the safest way to go. Thanks for your help guys.  

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
Suggested Courses
Course of the Month16 days, 21 hours left to enroll

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question