We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

How to parse string using xpath only?

aliasim99
aliasim99 asked
on
Medium Priority
1,468 Views
Last Modified: 2013-11-11
I have a file name that I need to parse and get the extenstion out. So the xpath I had was working.. suppose I have a file name abcd.txt I was using this
(substring-after(normalize-space(/Data/parseResults/fileName/text()), '.'))
IT worked great until now .. the filenames are like
abcd.efg.hij.txt

So the above xpath gives me the extension as efg.hij.txt which is ofcourse not correct..  Is there any way the get the right extension using xpath only?
Comment
Watch Question

Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
Well, since you put it in the XQuery zone as well, I assume you can use XPath 2.0

tokenize(/Data/parseResults/fileName, '\.')[last()]

If you are not using XSLT2, you have some options
- in XSLT you could use recursion
- or you could hope that the extensions are predictable

So, I need to know some things.
What is the context? Is this in an XQuery, in an XSLT1, or in some DOM processing?
How predictable are the extensions, are they all three character extensions, is there a limited set of possibilities?
- If XQuery or XSLT2, use the above suggested XPath
- If extensions are all three characters, you could use this
         substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 2)
- If not predictable extensions and XSLT1, use recursion in XSLT1
- If none of the above, you could try something like this
         substring-after(substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 5), '.')
but then you need to hope that no extension is longer than 5 characters and no combination '.hij.txt' is smaller than 5 characters
The last one is a bit unsafe in my mind

Author

Commented:
Below is my XML and yes I thought of recursive and so far that's the best way to do it and I agree it's unsafe. The max number of dots I've seen in a file name here in my client's enviornment is 6 and below is what I'm planning to use

substring-after(substring-after(substring-after(substring-after(substring-after(substring-after(ProcessData/DataInfo/DocumentName/text(),'.'),'.'),'.'),'.'),'.'),'.')

I will make it for upto 10 dots to be on the safe side.  It works perfectly.
It's not XPATH 2.0. Let me know if you can think of a better way.


<?xml version="1.0" encoding="UTF-8"?>
<ProcessData>
  <FileName>E:\edi\inbound\LBD1.SCI.PHY_TEST.TXT</FileName>
  <ProcessInformation>
    <DateTimeFormat2>20090212171141</DateTimeFormat2>
  </ProcessInformation>
  <DataInfo>
    <DocumentCreateTime/>
    <DocumentName>LBD1.SCI.PHY_TEST.TXT</DocumentName>
  </DataInfo>
  <parseFileName>
    <delimiter>\</delimiter>
    <absoluteFileName>E:\edi\LBD1PHY_TEST.TXT</absoluteFileName>
  </parseFileName>
</ProcessData>

Open in new window

Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
Does that work?
I don't think it will.

Do you have an idea on what extensions are possible?
Commented:
Named template recursion should do the trick and not run into the same types. You create a template that takes a string parameter, use the substring-after to get the string after the first period, then call the template again (passing the new value) until you get a string that has no period. I believe I actually saw such a solution elsewhere on EE, but found the attached snippet elsewhere.

Make the initial call to the template with something like this:
    <xsl:call-template name="getExtension">
        <xsl:with-param name="filename" select="'abcd.efg.hij.txt'"/>
    </xsl:call-template>

<xsl:template name="getExtension">
<xsl:param name="filename"/>
 
  <xsl:choose>
    <xsl:when test="contains($filename, '.')">
    <xsl:call-template name="getExtension">
      <xsl:with-param name="filename" select="substring-after($filename, '.')"/>
    </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$filename"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Open in new window

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Commented:
My first line should have read: "Named template recursion should do the trick and not run into the same types of issues." This was a reference to the fact that you don't have to choose an arbitrary number of periods to look for or worry about the size or predictability of the extensions.

Author

Commented:
I solved this problem a while ago and I had to do the same thing run a loop over the string and check if there are any more dots. I was using BPML so I was able to do it. there is not straight out of the box solution for this so what you suggested is pretty much what I did just in a different language. Thanks for your response.
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
Please be fair when grading
The original question said
"How to parse string using xpath only?"
I never got a response to my last question,
and now you accept an answer that is doing this "outside XPath" only
This does not reward the effort I did to help you get an XPath only solution as you require

Commented:
I'm not entirely sure what the proper protocol is, but did want to concur with Gertone on fairness when grading. Experts Exchange relies on the reward system to encourage participation.

I admit, I didn't take the "xpath only" restriction literally. Given Gertone's track record on EE - and in particular in the XSLT zone - I'm sure that without such a restriction Gertone would have had this question answered before I saw it. I'm also sure that a day will come when I have a sticky problem and hope Gertone and other experts are around to help.

Author

Commented:
Yeah.. I apologize for that. I should have been more careful. I had to go outside of xpath because I'm still using XML 1.0 and could not find a solution. The best solution was from Gertone using this
substring(/Data/parseResults/fileName, string-length(/Data/parseResults/fileName) - 2)
But I cant say that extension will always be 3 characters long. I just used a sample file name here the actual ones are 25 to 35 character long with 6-7 dots. So using the loop was the safest way to go. Thanks for your help guys.  
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.