asked on

XSLT - Plaint Text To XML

Is it possible to take a "structured" non xml based plain text file and transform it into XML via XSLT ?

No it only works with XML.

http://www.xml.com/pub/a/2003/11/26/learnXSLT.html

Extensible Stylesheet Language Transformations or XSLT is a language that allows you to transform XML documents into XML, HTML, XHTML, or plain text documents.

Gertone (Geert Bormans)

I would argue against this since there are much better techniques,
using eg. python or ruby regular expressions,
or various parser builders that exist for various structured formats

But if the structured text is utf-8 encoded, you could wrap a root tag around it

If you were using XSLT2, you could then use the regex functionality to construct XML

If you are using XSLT2 anyway there are techniques to read non XML text formats and use regexes on them. I am still in favour to keep the heavy lifting out of the XSLT

Note that tools such as XHTML Tidy or TagSoup can be used to transform lausy html or files that look like XML from a distance, into real XML/XHTML. In a next step you can clean up using XSLT if you wish

I could give some more directions, if you gave us the feeling of what exactly the structured text looked like.

Anyway, if you were just looking for an answer "Is it possible?"
Yes it is,
I just finished an XSLT1 stylesheet that takes an EDI message in to properly structured XML... it can be done, but there is more fun in life :-)

Gertone (Geert Bormans)

@Number-1

No it only works with XML.

given that you reference a 9 year old article on a 12 year old language... there has been some evolution.

Your quote holds true only if you consider the text file unchanged as the input file to an XSLT1 process, not taking into account the extensions some XSLT1 processors had.

You imply a LOT of limitations in your reply, and none of them were implied by the question asked

- unchanged: as I said, you can wrap a root tag around it (simple piping in a command line) and then you have XML (preferably add CDATA sections). Or you could have a preprocess step as suggested before
- input file: you could have a dummy input file (or none at all, since from XSLT2 you can call a named template as the starting point) and pull in the text file as a string param argument(XSLT2 and 1), or read it through the unparsed-text() function (XSLT2 only)
- XSLT1: XSLT2 is stable enough and for a task like this I don't recommend recursive substring processing if you know you have regular expression functionality in XSLT2
- extensions: some XSLT1 processors have extensions that pull in some XSLT2 functionality in XSLT1 already (it is worth looking at www.exslt.org

Molko

ASKER

I want to take this :

 Volume in drive C has no label.
 Volume Serial Number is 9C8E-C68B

 Directory of C:\Java

28/02/2012  10:30    <DIR>          .
28/02/2012  10:30    <DIR>          ..
28/10/2011  20:57    <DIR>          jre6
23/10/2011  16:50    <DIR>          lib
06/01/2012  15:03    <DIR>          workspace
23/10/2011  16:50                        helloworld.java
               0 File(s)              0 bytes
               7 Dir(s)  696,677,314,560 bytes free