Molko
asked on
XSLT - Plaint Text To XML
Is it possible to take a "structured" non xml based plain text file and transform it into XML via XSLT ?
I would argue against this since there are much better techniques,
using eg. python or ruby regular expressions,
or various parser builders that exist for various structured formats
But if the structured text is utf-8 encoded, you could wrap a root tag around it
If you were using XSLT2, you could then use the regex functionality to construct XML
If you are using XSLT2 anyway there are techniques to read non XML text formats and use regexes on them. I am still in favour to keep the heavy lifting out of the XSLT
Note that tools such as XHTML Tidy or TagSoup can be used to transform lausy html or files that look like XML from a distance, into real XML/XHTML. In a next step you can clean up using XSLT if you wish
I could give some more directions, if you gave us the feeling of what exactly the structured text looked like.
Anyway, if you were just looking for an answer "Is it possible?"
Yes it is,
I just finished an XSLT1 stylesheet that takes an EDI message in to properly structured XML... it can be done, but there is more fun in life :-)
using eg. python or ruby regular expressions,
or various parser builders that exist for various structured formats
But if the structured text is utf-8 encoded, you could wrap a root tag around it
If you were using XSLT2, you could then use the regex functionality to construct XML
If you are using XSLT2 anyway there are techniques to read non XML text formats and use regexes on them. I am still in favour to keep the heavy lifting out of the XSLT
Note that tools such as XHTML Tidy or TagSoup can be used to transform lausy html or files that look like XML from a distance, into real XML/XHTML. In a next step you can clean up using XSLT if you wish
I could give some more directions, if you gave us the feeling of what exactly the structured text looked like.
Anyway, if you were just looking for an answer "Is it possible?"
Yes it is,
I just finished an XSLT1 stylesheet that takes an EDI message in to properly structured XML... it can be done, but there is more fun in life :-)
@Number-1
given that you reference a 9 year old article on a 12 year old language... there has been some evolution.
Your quote holds true only if you consider the text file unchanged as the input file to an XSLT1 process, not taking into account the extensions some XSLT1 processors had.
You imply a LOT of limitations in your reply, and none of them were implied by the question asked
- unchanged: as I said, you can wrap a root tag around it (simple piping in a command line) and then you have XML (preferably add CDATA sections). Or you could have a preprocess step as suggested before
- input file: you could have a dummy input file (or none at all, since from XSLT2 you can call a named template as the starting point) and pull in the text file as a string param argument(XSLT2 and 1), or read it through the unparsed-text() function (XSLT2 only)
- XSLT1: XSLT2 is stable enough and for a task like this I don't recommend recursive substring processing if you know you have regular expression functionality in XSLT2
- extensions: some XSLT1 processors have extensions that pull in some XSLT2 functionality in XSLT1 already (it is worth looking at www.exslt.org
No it only works with XML.
given that you reference a 9 year old article on a 12 year old language... there has been some evolution.
Your quote holds true only if you consider the text file unchanged as the input file to an XSLT1 process, not taking into account the extensions some XSLT1 processors had.
You imply a LOT of limitations in your reply, and none of them were implied by the question asked
- unchanged: as I said, you can wrap a root tag around it (simple piping in a command line) and then you have XML (preferably add CDATA sections). Or you could have a preprocess step as suggested before
- input file: you could have a dummy input file (or none at all, since from XSLT2 you can call a named template as the starting point) and pull in the text file as a string param argument(XSLT2 and 1), or read it through the unparsed-text() function (XSLT2 only)
- XSLT1: XSLT2 is stable enough and for a task like this I don't recommend recursive substring processing if you know you have regular expression functionality in XSLT2
- extensions: some XSLT1 processors have extensions that pull in some XSLT2 functionality in XSLT1 already (it is worth looking at www.exslt.org
ASKER
I want to take this :
into something like
Volume in drive C has no label.
Volume Serial Number is 9C8E-C68B
Directory of C:\Java
28/02/2012 10:30 <DIR> .
28/02/2012 10:30 <DIR> ..
28/10/2011 20:57 <DIR> jre6
23/10/2011 16:50 <DIR> lib
06/01/2012 15:03 <DIR> workspace
23/10/2011 16:50 helloworld.java
0 File(s) 0 bytes
7 Dir(s) 696,677,314,560 bytes free
into something like
<disk>
<dir>
<name>C:/</name>
<dir>
<name>java</name>
<directory>
<name>jre6</name>
</directory>
<directory>
<name>lib</name>
</directory>
<directory>
<name>workspace</name>
</directory>
<file>
<name>helloworld.java</name>
</file>
</dir>
</dir>
</disk>
Well, you could wrap a <root> tag around this and add CDATA
<root><![CDATA[....
...]]></root>
and then regex your way through it (XSLT2)
or even substring through it with some recursion (XSLT1)
But is there a reason why you would want to do this?
Because the infrastructure is in place?
Or you don't have other tools available
I would throw some lines of ruby with XML builder and this is done, easy and concise
<root><![CDATA[....
...]]></root>
and then regex your way through it (XSLT2)
or even substring through it with some recursion (XSLT1)
But is there a reason why you would want to do this?
Because the infrastructure is in place?
Or you don't have other tools available
I would throw some lines of ruby with XML builder and this is done, easy and concise
ASKER
Thanks.
I might have a look at the <root><![CDATA[.......]]>< /root> then apply some fancy regex - i'll see how i get on.
Yes, the reason I am considering XSLT2 for this, is that the infrastructure it already in place and if I could do it in XSLT2 it would save a lot of Java coding...well thats the theory :-)
Not really sure i could use Ruby in a JEE stack...hmm. Failing all the above i'll have to resort to parsing the file in Java
I might have a look at the <root><![CDATA[.......]]><
Yes, the reason I am considering XSLT2 for this, is that the infrastructure it already in place and if I could do it in XSLT2 it would save a lot of Java coding...well thats the theory :-)
Not really sure i could use Ruby in a JEE stack...hmm. Failing all the above i'll have to resort to parsing the file in Java
ASKER
out of interest...how would the CDATA help ? I guess the regex would match on 'Directory of' and then each '<DIR>' ...actually it might be better for the regex to match on the endofline as i would probably need the datetimes as well.
Could you show me a quick example ?....
Could you show me a quick example ?....
If this is XSLT2, have a look at unparsed-text()
and use that in a named template that you trigger using -it (initial template)
... it is java, so I assume saxon
and use that in a named template that you trigger using -it (initial template)
... it is java, so I assume saxon
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
match="/" template is only there for the case you would have a dummy source XML and no -it set
ASKER
wow...thanks i'll take a good look at that.
yes, XSLT2 and Saxon.
Thankyou
yes, XSLT2 and Saxon.
Thankyou
welcome
ASKER
Hi
Thanks again for this, I am going through it now....
The thing that I am concerned with is that usually i would issue java code like this :
How would that work with the XLST you have provided ? The one you have shown reads a file from a disk location ? Just wondering how i could invoke it in the above style ?
Thanks again
Thanks again for this, I am going through it now....
The thing that I am concerned with is that usually i would issue java code like this :
Transformer transformer =
tFactory.newTransformer(xslt);
transformer.transform(xml, output);
How would that work with the XLST you have provided ? The one you have shown reads a file from a disk location ? Just wondering how i could invoke it in the above style ?
Thanks again
well, I am not much of a java man.
you could make "xml" a dummy xml document with one empty root tag "<root/>"
the XSLT I did works in two modes
- either pass the dummy xml with the empty root tag as shown
- or don't pass a source XML, but use the -it parameter to indicate that the transformm needs to start with the template name="start"
The second option is cleaner, but I have reverted to the earlier option, with the dummy xml, before, in order to allow java developers in my team to reuse the java code as they had it available... it feels more classic
In your question you were referring to a file
this can either be a url, or simply (if it is on disk) the path to the file,
having the protocol file:/ in front of it (as is in my example)
I would not know how to pass the uri if it were not a file on disk
but than you could pass in a string as a parameter (you would not even need unparsed-text)
or save it on disk temporarily
you could make "xml" a dummy xml document with one empty root tag "<root/>"
the XSLT I did works in two modes
- either pass the dummy xml with the empty root tag as shown
- or don't pass a source XML, but use the -it parameter to indicate that the transformm needs to start with the template name="start"
The second option is cleaner, but I have reverted to the earlier option, with the dummy xml, before, in order to allow java developers in my team to reuse the java code as they had it available... it feels more classic
In your question you were referring to a file
non xml based plain text fileSo, pass the parameter $input-file-uri the uri of this file
this can either be a url, or simply (if it is on disk) the path to the file,
having the protocol file:/ in front of it (as is in my example)
I would not know how to pass the uri if it were not a file on disk
but than you could pass in a string as a parameter (you would not even need unparsed-text)
or save it on disk temporarily
ASKER
Sorry I perhaps unintentionally misled you somewhat.
The data is stored in a file, but i have already read this into my Java app, with the contents of the file I would then apply the xslt tranformation.
The data is stored in a file, but i have already read this into my Java app, with the contents of the file I would then apply the xslt tranformation.
/**
* Simple transformation method.
* @param sourcePath - Absolute path to source xml file.
* @param xsltPath - Absolute path to xslt file.
* @param resultDir - Directory where you want to put resulting files.
*/
public static void simpleTransform(String sourcePath, String xsltPath,
String resultDir) {
TransformerFactory tFactory = TransformerFactory.newInstance();
try {
Transformer transformer =
tFactory.newTransformer(new StreamSource(new File(xsltPath)));
transformer.transform(new StreamSource(new File(sourcePath)),
new StreamResult(new File(resultDir)));
} catch (Exception e) {
e.printStackTrace();
}
}
If the information is in a file... leave it there and let teh XSLT processor deal with it
If you insist in having it as the XML source, you need to make it XML.... and you will hit encoding issues, no doubt.
Loading the file uri as unparsed-text()
- takes away that risk
- saves you a bunch of java code
I would know my prefered strategy
If you insist in having it as the XML source, you need to make it XML.... and you will hit encoding issues, no doubt.
Loading the file uri as unparsed-text()
- takes away that risk
- saves you a bunch of java code
I would know my prefered strategy
ASKER
Thankyou
It in a 'File' now, becuase I am working with it locally to get the XSLT to work etc.
In essence its only in a File now, as I am working with it.
Once i have finished the XSLT, the 'real' data will reach my Java component as a String, which i guess i will need to wrap with something like
and CDATA arund the 'DIR' etc.
and then instantiate an XML DOM and invoke the SAXON parser to tranform XSL DOM with the XSLT.
Thats the plan.........:-)
sorry for the confusion re. 'File'.
Thanks
It in a 'File' now, becuase I am working with it locally to get the XSLT to work etc.
In essence its only in a File now, as I am working with it.
Once i have finished the XSLT, the 'real' data will reach my Java component as a String, which i guess i will need to wrap with something like
<?xml version="1.0" encoding="UTF-8"?>
<root> .....</root>
and CDATA arund the 'DIR' etc.
and then instantiate an XML DOM and invoke the SAXON parser to tranform XSL DOM with the XSLT.
Thats the plan.........:-)
sorry for the confusion re. 'File'.
Thanks
Anyway, you will hit errors because of encoding issues
Just wrap a CDATA around the whole string
drop the template match="/"
make this
into
and it will work the same
I just hope that by parsing the parsing the string (that is exactly what will happen, your pseudo XML will hit the xml parser prior to getting to the XSLT) will not kill your new lines
In theory CR and LF are normalized to a single '\n' in XML before parsing... but it will depend on your application to be sure
Just wrap a CDATA around the whole string
<?xml version="1.0" encoding="iso-8859-1"?>
<root><![CDATA[...]]></root>
drop the template match="/"
make this
<xsl:template name="start">
<xsl:variable name="input-str" select="unparsed-text($input-file-uri, 'iso..."/>
into
<xsl:template select="root">
<xsl:variable name="input-str" select="."/>
and it will work the same
I just hope that by parsing the parsing the string (that is exactly what will happen, your pseudo XML will hit the xml parser prior to getting to the XSLT) will not kill your new lines
In theory CR and LF are normalized to a single '\n' in XML before parsing... but it will depend on your application to be sure
ASKER
I thought everthing would be encoded to UTF-8 ? No ?
The XSLT adjusted to :
Running your stuff (as adjusted above) through XML SPY (using built in XSLT transformer) seems to be getting results.
The XSLT adjusted to :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="root">
<xsl:variable name="input-str" select="."/>
<disk>
<dir>......
......
and XML-ised input like
......
<?xml version="1.0" encoding="UTF-8"?>
<root>
<![CDATA[Volume in drive D has no label.
Volume Serial Number is 145C-E872
Directory of D:\
25/02/2012 13:22 BACKKUP
18/04/2012 11:43 0 dir.txt
12/09/2009 07:23 Documents and Settings
Running your stuff (as adjusted above) through XML SPY (using built in XSLT transformer) seems to be getting results.
very unlikely that a dir command line returns UTF-8,
you need to set teh encoding of the generated XML to what you expect from teh text file
you need to set teh encoding of the generated XML to what you expect from teh text file
ASKER
fair point. I'll look into that.
Thanks very much for your help, more than I expected....
I have a bit of more work to do on this, as i need to expand the 'pwd'
Directory of D:\temp\etc <------------------this bit
25/02/2012 13:22 <DIR> BACKUP
25/02/2012 13:22 <DIR> BACKUP1
25/02/2012 13:22 example.txt
into
<?xml version="1.0" encoding="UTF-8"?>
<disk>
<dir>
<name>d:</name>
<dir>
<name>temp</name>
<dir>
<name>etc</name>
<directory>
<name>BACKUP</name>
</directory>
<directory>
<name>BACKUP1</name>
</directory>
<file>
<name>example.txt</name>
</file>
</dir>
</dir>
</dir>
</disk>
I'll have a stab at that....
So I will close this question now..
Thanks again for giving me a headstart...
Thanks very much for your help, more than I expected....
I have a bit of more work to do on this, as i need to expand the 'pwd'
Directory of D:\temp\etc <------------------this bit
25/02/2012 13:22 <DIR> BACKUP
25/02/2012 13:22 <DIR> BACKUP1
25/02/2012 13:22 example.txt
into
<?xml version="1.0" encoding="UTF-8"?>
<disk>
<dir>
<name>d:</name>
<dir>
<name>temp</name>
<dir>
<name>etc</name>
<directory>
<name>BACKUP</name>
</directory>
<directory>
<name>BACKUP1</name>
</directory>
<file>
<name>example.txt</name>
</file>
</dir>
</dir>
</dir>
</disk>
I'll have a stab at that....
So I will close this question now..
Thanks again for giving me a headstart...
welcome,
splitting that last bit out is not a big task
I would have a replace function to get the line out with "Directory of" upto "\n"
do a tokenize-string on that result, splitting on the ":\"
first part is the "d", second part is the rest
it is pretty straightforward. If you have issues with that, I can help you with it
have fun
splitting that last bit out is not a big task
I would have a replace function to get the line out with "Directory of" upto "\n"
do a tokenize-string on that result, splitting on the ":\"
first part is the "d", second part is the rest
it is pretty straightforward. If you have issues with that, I can help you with it
have fun
ASKER
Thankyou
Excellent.
Excellent.
ASKER
Hi
Do you want me to open a new question (happy to do so).
This is what I have produced so far, what do you think ? :
I have a slight issue with the structure of the output, cant seem to figure out (yet...)
Do you want me to open a new question (happy to do so).
This is what I have produced so far, what do you think ? :
I have a slight issue with the structure of the output, cant seem to figure out (yet...)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="root">
<xsl:variable name="input-str" select="."/>
<disk>
<dir>
<xsl:analyze-string select="$input-str" regex="\n">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:analyze-string select="." regex="(Directory of\s)(\w:.+)">
<xsl:matching-substring>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="regex-group(2)"/>
</xsl:call-template>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:analyze-string select="." regex="(\d+/\d+/\d+\s+\d+:\d+\s+)((<DIR>)?)?\s+(.+)">
<xsl:matching-substring>
<xsl:variable name="elem-name" select="if(normalize-space(regex-group(2))) then('directory') else('file')"/>
<xsl:element name="{$elem-name}">
<name>
<xsl:choose>
<xsl:when test="$elem-name= 'file'">
<xsl:value-of select="substring-after(normalize-space(regex-group(4)),' ')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="normalize-space(regex-group(4))"/>
</xsl:otherwise>
</xsl:choose>
</name>
<date>
<xsl:value-of select="normalize-space(regex-group(1))"/>
</date>
<size>
<xsl:choose>
<xsl:when test="$elem-name= 'file'">
<xsl:value-of select="substring-before(normalize-space(regex-group(4)),' ')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="'0'"/>
</xsl:otherwise>
</xsl:choose>
</size>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring/>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</dir>
</disk>
</xsl:template>
<xsl:template name="process-path">
<xsl:param name="path"/>
<xsl:choose>
<xsl:when test="contains($path, '\')">
<dir>
<xsl:choose>
<xsl:when test="contains(substring-before($path, '\'), ':')">
<name>
<xsl:value-of select="substring-before($path, '\')"/>
</name>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="substring-after($path, '\')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<name>
<xsl:value-of select="substring-before($path, '\')"/>
</name>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="substring-after($path, '\')"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</dir>
</xsl:when>
<xsl:when test="string-length($path) > 0">
<dir>
<name>
<xsl:value-of select="$path"/>
</name>
</dir>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?>
<root><![CDATA[Volume in drive D has no label.
Volume Serial Number is 145C-E872
Directory of D:\temp\etc
25/02/2012 13:22 <DIR> BACKUP
18/04/2012 11:43 5000 TextFile.txt
12/09/2009 07:23 <DIR> Documents and Settings
25/02/2012 16:51 <DIR> Program Files
02/05/2009 17:24 <DIR> wamp
25/02/2012 16:51 <DIR> WINDOWS
10/03/2010 20:19 <DIR> workspace
1 File(s) 0 bytes
6 Dir(s) 46,973,202,432 bytes free
]]></root>
You should always use
$path
instead of
string-length($path) > 0
an empty string in a boolean expression evaluates to false
personally I always do normalize-space($path) as a test
I did some smarter regex, to get rid of the chooses (I hate chooses when not necessary),
they clutter the code
Here is what I would make out of this
maybe you like it
$path
instead of
string-length($path) > 0
an empty string in a boolean expression evaluates to false
personally I always do normalize-space($path) as a test
I did some smarter regex, to get rid of the chooses (I hate chooses when not necessary),
they clutter the code
Here is what I would make out of this
maybe you like it
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="root">
<xsl:variable name="input-str" select="."/>
<disk>
<dir>
<xsl:analyze-string select="$input-str" regex="\n">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:analyze-string select="." regex="(Directory of\s)(\w:)\\?(.*)">
<xsl:matching-substring>
<dir>
<name><xsl:value-of select="regex-group(2)"/></name>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="tokenize(regex-group(3), '\\')"/>
</xsl:call-template>
</dir>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:analyze-string select="." regex="(\d+/\d+/\d+\s+\d+:\d+)\s+(<DIR>|\d+)\s+(.+)">
<xsl:matching-substring>
<xsl:variable name="elem-name" select="if(matches(regex-group(2), '\d+')) then('directory') else('file')"/>
<xsl:element name="{$elem-name}">
<name>
<xsl:value-of select="normalize-space(regex-group(3))"/>
</name>
<date>
<xsl:value-of select="normalize-space(regex-group(1))"/>
</date>
<size>
<xsl:value-of select="number(translate(regex-group(2), '<>DIRdir', '0'))"/>
</size>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring/>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</dir>
</disk>
</xsl:template>
<xsl:template name="process-path">
<xsl:param name="path"/>
<xsl:if test="count($path) > 0">
<dir>
<name>
<xsl:value-of select="$path[1]"/>
</name>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="$path[position() > 1]"/>
</xsl:call-template>
</dir>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
ASKER
wow....yes, i like !...i like a lot...although you have mixed up directories with files :-)
I am still left with my structural problem though, i cant seem to solve it....
I need the output to be
at the moment its :
I am still left with my structural problem though, i cant seem to solve it....
I need the output to be
<?xml version="1.0" encoding="UTF-8"?>
<disk>
<dir>
<name>D:</name>
<dir>
<name>temp</name>
<dir>
<name>etc</name>
<directory>
<name>BACKUP</name>
<date>25/02/2012 13:22</date>
<size>0</size>
</directory>
<file>
<name>TextFile.txt</name>
<date>18/04/2012 11:43</date>
<size>5000</size>
</file>
<directory
<name>Documents and Settings</name>
<date>12/09/2009 07:23</date>
<size>0</size>
</directory>
<directory>
<name>Program Files</name>
<date>25/02/2012 16:51</date>
<size>0</size>
</directory>
<directory>
<name>wamp</name>
<date>02/05/2009 17:24</date>
<size>0</size>
</directory>
<directory>
<name>WINDOWS</name>
<date>25/02/2012 16:51</date>
<size>0</size>
</directory>
<directory>
<name>workspace</name>
<date>10/03/2010 20:19</date>
<size>0</size>
</directory>
</dir>
</dir>
</dir>
</disk>
at the moment its :
<?xml version="1.0" encoding="UTF-8"?>
<disk>
<dir>
<dir>
<name>D:</name>
<dir>
<name>temp</name>
<dir>
<name>etc</name>
</dir>
</dir>
</dir>
<file>
<name>BACKUP</name>
<date>25/02/2012 13:22</date>
<size>0</size>
</file>
<directory>
<name>TextFile.txt</name>
<date>18/04/2012 11:43</date>
<size>5000</size>
</directory>
<file>
<name>Documents and Settings</name>
<date>12/09/2009 07:23</date>
<size>0</size>
</file>
<file>
<name>Program Files</name>
<date>25/02/2012 16:51</date>
<size>0</size>
</file>
<file>
<name>wamp</name>
<date>02/05/2009 17:24</date>
<size>0</size>
</file>
<file>
<name>WINDOWS</name>
<date>25/02/2012 16:51</date>
<size>0</size>
</file>
<file>
<name>workspace</name>
<date>10/03/2010 20:19</date>
<size>0</size>
</file>
</dir>
</disk>
I did some restructuring (this way it also works for the root dir)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:variable name="input-str" select="/root"/>
<xsl:template match="/">
<disk>
<xsl:analyze-string select="$input-str" regex="Directory\s+of\s+(\w:)\\?([^\n]*)\n">
<xsl:matching-substring>
<xsl:variable name="this-path" select="tokenize(regex-group(2), '\\')"/>
<dir>
<name><xsl:value-of select="regex-group(1)"/></name>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="$this-path"/>
</xsl:call-template>
<xsl:call-template name="process-file-list">
<xsl:with-param name="path" select="$this-path"/>
</xsl:call-template>
</dir>
</xsl:matching-substring>
<xsl:non-matching-substring/>
</xsl:analyze-string>
</disk>
</xsl:template>
<xsl:template name="process-path">
<xsl:param name="path"/>
<xsl:if test="count($path) > 0">
<dir>
<name>
<xsl:value-of select="$path[1]"/>
</name>
<xsl:call-template name="process-path">
<xsl:with-param name="path" select="$path[position() > 1]"/>
</xsl:call-template>
</dir>
</xsl:if>
<xsl:call-template name="process-file-list">
<xsl:with-param name="path" select="$path"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="process-file-list">
<xsl:param name="path"/>
<xsl:if test="count($path) = 0">
<xsl:analyze-string select="$input-str" regex="\n">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:analyze-string select="." regex="(\d+/\d+/\d+\s+\d+:\d+)\s+(<DIR>|\d+)\s+(.+)">
<xsl:matching-substring>
<xsl:variable name="elem-name" select="if(matches(regex-group(2), '\d+')) then('file') else('directory')"/>
<xsl:element name="{$elem-name}">
<name>
<xsl:value-of select="normalize-space(regex-group(3))"/>
</name>
<date>
<xsl:value-of select="normalize-space(regex-group(1))"/>
</date>
<size>
<xsl:value-of select="number(translate(regex-group(2), '<>DIRdir', '0'))"/>
</size>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring/>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
ASKER
Hi
Is your latest regex correct ?
It does not seem to work, however if i change it to
Just wondering what your changes intended to do.
Thankyou
Is your latest regex correct ?
regex="Directory\s+of\s+(\w:)\\?([^\n]*)\n">
It does not seem to work, however if i change it to
regex="Directory\s+of\s+(\w:)\\?([^\n].*)">
then it seems to be OK.Just wondering what your changes intended to do.
Thankyou
since I use that on the full input-str, I need this ([^\n]*) for grabbing the filename (and stopping at the end-of line)
([^\n]*) means anything but a \n
([^\n].*) means one character that is not \n and a bunch of other things
That can't be right
If I run the XSLT I posted, this is what I get
([^\n]*) means anything but a \n
([^\n].*) means one character that is not \n and a bunch of other things
That can't be right
If I run the XSLT I posted, this is what I get
<disk>
<dir>
<name>D:</name>
<dir>
<name>temp</name>
<dir>
<name>etc</name>
<directory>
<name>BACKUP</name>
<date>25/02/2012 13:22</date>
<size>0</size>
</directory>
...
ASKER
I thought '.' would match on any character except \n
(.*) would mean anything but a \n
(.*) would mean anything but a \n
(.*) would mean anything but a \n
this depends on the mode actually, I tend to do a lot of multiline regexes, so I work in "dot-all" mode from time to time... so I tend to be more prudent than necessary
without modifier you are right
and this should be the same as what I had originally written
<xsl:analyze-string select="$input-str" regex="Directory\s+of\s+(\w:)\\?(.*)">
having no reference at all to the \n
http://www.xml.com/pub/a/2003/11/26/learnXSLT.html