How to parse string using xpath only?

Posted on 2009-02-12
Last Modified: 2013-11-11
I have a file name that I need to parse and get the extenstion out. So the xpath I had was working.. suppose I have a file name abcd.txt I was using this
(substring-after(normalize-space(/Data/parseResults/fileName/text()), '.'))
IT worked great until now .. the filenames are like

So the above xpath gives me the extension as efg.hij.txt which is ofcourse not correct..  Is there any way the get the right extension using xpath only?
Question by:aliasim99
    LVL 60

    Expert Comment

    by:Geert Bormans
    Well, since you put it in the XQuery zone as well, I assume you can use XPath 2.0

    tokenize(/Data/parseResults/fileName, '\.')[last()]

    If you are not using XSLT2, you have some options
    - in XSLT you could use recursion
    - or you could hope that the extensions are predictable

    So, I need to know some things.
    What is the context? Is this in an XQuery, in an XSLT1, or in some DOM processing?
    How predictable are the extensions, are they all three character extensions, is there a limited set of possibilities?
    - If XQuery or XSLT2, use the above suggested XPath
    - If extensions are all three characters, you could use this
             substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 2)
    - If not predictable extensions and XSLT1, use recursion in XSLT1
    - If none of the above, you could try something like this
             substring-after(substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 5), '.')
    but then you need to hope that no extension is longer than 5 characters and no combination '.hij.txt' is smaller than 5 characters
    The last one is a bit unsafe in my mind

    Author Comment

    Below is my XML and yes I thought of recursive and so far that's the best way to do it and I agree it's unsafe. The max number of dots I've seen in a file name here in my client's enviornment is 6 and below is what I'm planning to use


    I will make it for upto 10 dots to be on the safe side.  It works perfectly.
    It's not XPATH 2.0. Let me know if you can think of a better way.

    <?xml version="1.0" encoding="UTF-8"?>

    Open in new window

    LVL 60

    Expert Comment

    by:Geert Bormans
    Does that work?
    I don't think it will.

    Do you have an idea on what extensions are possible?
    LVL 1

    Accepted Solution

    Named template recursion should do the trick and not run into the same types. You create a template that takes a string parameter, use the substring-after to get the string after the first period, then call the template again (passing the new value) until you get a string that has no period. I believe I actually saw such a solution elsewhere on EE, but found the attached snippet elsewhere.

    Make the initial call to the template with something like this:
        <xsl:call-template name="getExtension">
            <xsl:with-param name="filename" select="'abcd.efg.hij.txt'"/>

    <xsl:template name="getExtension">
    <xsl:param name="filename"/>
        <xsl:when test="contains($filename, '.')">
        <xsl:call-template name="getExtension">
          <xsl:with-param name="filename" select="substring-after($filename, '.')"/>
          <xsl:value-of select="$filename"/>

    Open in new window

    LVL 1

    Expert Comment

    My first line should have read: "Named template recursion should do the trick and not run into the same types of issues." This was a reference to the fact that you don't have to choose an arbitrary number of periods to look for or worry about the size or predictability of the extensions.

    Author Comment

    I solved this problem a while ago and I had to do the same thing run a loop over the string and check if there are any more dots. I was using BPML so I was able to do it. there is not straight out of the box solution for this so what you suggested is pretty much what I did just in a different language. Thanks for your response.
    LVL 60

    Expert Comment

    by:Geert Bormans
    Please be fair when grading
    The original question said
    "How to parse string using xpath only?"
    I never got a response to my last question,
    and now you accept an answer that is doing this "outside XPath" only
    This does not reward the effort I did to help you get an XPath only solution as you require
    LVL 1

    Expert Comment

    I'm not entirely sure what the proper protocol is, but did want to concur with Gertone on fairness when grading. Experts Exchange relies on the reward system to encourage participation.

    I admit, I didn't take the "xpath only" restriction literally. Given Gertone's track record on EE - and in particular in the XSLT zone - I'm sure that without such a restriction Gertone would have had this question answered before I saw it. I'm also sure that a day will come when I have a sticky problem and hope Gertone and other experts are around to help.

    Author Comment

    Yeah.. I apologize for that. I should have been more careful. I had to go outside of xpath because I'm still using XML 1.0 and could not find a solution. The best solution was from Gertone using this
    substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 2)
    But I cant say that extension will always be 3 characters long. I just used a sample file name here the actual ones are 25 to 35 character long with 6-7 dots. So using the loop was the safest way to go. Thanks for your help guys.  

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
    Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
    In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
    Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    7 Experts available now in Live!

    Get 1:1 Help Now