Link to home
Start Free TrialLog in
Avatar of mbryan822
mbryan822

asked on

Transforming XML into formatted text using XSLT

I want to transform XML files exported from Treepad into formatted text files that ShadowPlan on my Palm can import. All of the tutorials and help info I can find on XSLT focuses on XML -> XHTML and there is very little about how to use XSLT to generate formatted text. I've only been partially successful so far and am convinced that I'm approaching this from the wrong angle. I can't get <xsl:output method="text" indent="yes"> to work the way I believe it should.

The files are in a tree structure where each element can have an article/note attached.

Here are examples of the XML that will be used as input, and then the format that the output needs to be in to be imported into Shadow.

TreePad XML exported file:
<?xml version="1.0"?>
<treepad_xml version="1.0">
      <database>
            <name>Test Tree</name>
            <node>
                  <title>Test Tree</title>
                  <article datatype="Text"/>
                  <node>
                        <title>Item 1</title>
                        <article datatype="Text">This is item 1</article>
                        <node>
                              <title>Item 1a</title>
                              <article datatype="Text">This is item 1a</article>
                        </node>
                  </node>
                  <node>
                        <title>Item 2</title>
                        <article datatype="Text">This is item 2</article>
                        <node>
                              <title>Item 2a</title>
                              <article datatype="Text">This is item 2a</article>
                        </node>
                  </node>
            </node>
      </database>
</treepad_xml>


Rules for importing text into Shadow:
1) notes can be on multiple lines but must start with <Note: and end with >
2) items can be indented with tabs or spaces to indicate hierarchy. This example uses tabs to make the indentations clear.

The above XML file should result in exactly this text file here:

Item 1
<Note: this is item1>
      Item 1a
<Note: this is item 1a>
Item 2
<Note: this is item 2>
      Item 2a
<Note: this is item 2a>

Avatar of metalmickey
metalmickey

so for every instance of node you want to generate essentially an unordered list with sub-lists within each nested node?
this xslt will transform your xml intoa tree structure, although this is the html way of doing it....


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <xsl:output method="html" indent="yes"/>
  <xsl:template match="/">
    <html>
      <head>
        <title/>
      </head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  <xsl:template match="database">
    <ul>
      <xsl:apply-templates/>
    </ul>
  </xsl:template>
  <xsl:template match="node">
    <ul>
      <xsl:apply-templates/>
    </ul>
  </xsl:template>
  <xsl:template match="name">
    <li>
      <xsl:apply-templates/>
    </li>
  </xsl:template>
  <xsl:template match="title">
    <li>
      <xsl:apply-templates/>
    </li>
  </xsl:template>
  <xsl:template match="article">
    <li>
      <xsl:apply-templates/>
    </li>
  </xsl:template>
  <xsl:template match="@datatype"/>
</xsl:stylesheet>

You'll need to translate the <ul> into the linebreak equivalent and the li's into tab spaces. Since there is no markup around the text it may be difficult to indent the tabs using the xsl above.

its not the solution, so no points here, but it may provide some usight into the transformation structure of the xslt.


HTH

MM



ASKER CERTIFIED SOLUTION
Avatar of Yury_Delendik
Yury_Delendik

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mbryan822

ASKER

metalmickey - thanks for the info you posted, this will help me even though it's not exactly what I need right now. I appreciate the time you took to post the info you did. I'm trying to learn XSLT and anything helps right now.

--------------------------

Yury - your solution is VERY close but there are a couple small changes that need to be made to make the resulting text work with Shadow's import mechanism. I'm not good enough with XSLT yet (this is my first attempt at using it) and can't figure out how to modify your solution to get what I need.

1. indentation should be one space (or tab) per level. Your solution somehow creates 2 spaces instead of 1 space for each level of indentation. This confuses Shadow and the file doesn't import correctly until I manually remove the extra spaces.

2. the <Note: lines must have no spaces preceeding them or they are seen as nodes instead of notes for the preceeding node. See my original post to see that the <Note: lines are not indented.
I am increasing the point value for this question because I can't find a solution yet even though Yury's example was very close.

I must update the test .xml file because the one I submitted had been modified after being exported by Treepad. I guess I modified it in my attempts to get something working.

Here is what the test tree xml file that I orginally posted really should look like:

<?xml version="1.0"?>
<treepad_xml version="1.0">
      <database>
            <name>
Test Tree
</name>
            <node>
                  <title>
Test Tree
</title>
                  <article datatype="Text">

</article>
                  <node>
                        <title>
Item 1
</title>
                        <article datatype="Text">
This is item 1&#13;
</article>
                        <node>
                              <title>
Item 1a
</title>
                              <article datatype="Text">
This is item 1a
</article>
                        </node>
                  </node>
                  <node>
                        <title>
Item 2
</title>
                        <article datatype="Text">
This is item 2&#13;
</article>
                        <node>
                              <title>
Item 2a
</title>
                              <article datatype="Text">
This is item 2a&#13;
</article>
                        </node>
                  </node>
            </node>
      </database>
</treepad_xml>


I need the output to look like this:

Item 1
<Note: this is item1>
 Item 1a
<Note: this is item 1a>
Item 2
<Note: this is item 2>
 Item 2a
<Note: this is item 2a>


Note that Item 1a and Item 2a are indented by one space. This causes them to become children of Item 1 and Item 2 respectively when imported into Shadow on my Palm. If Item 1a had a child it would be indented with 2 spaces. Tabs are also ok instead of spaces, but it must remain consistent throughout the file.
Also note that the <Note: lines must appear on one line.

If I can get it to work with this .xml file as well as Yury's original solution did, I could finish the job with a simple AWK script. But, it seems to me that XSLT should be able to do it all.

I really appreciate any help anyone can give me with this. Even hints are very welcome. The things I'm not understanding are:
1. How do I control indentation?
2. How do I force CRLF's where I need them?
Extra Credit:
1. On the Palm, notes have a 4k limit, so for this to *really* work, it will need to split long notes up into 4k chunks. Depending on how difficult this will be to do, this project may not be worth pursuing. To work properly, the resulting file would have to look something like the example below.
(using the above example, if Item 1a had a note larger than 8k but smaller than 12k, it would split the note into 3 parts, each a child of Item 1a)

 Item 1a
  Part 1
<Note: this is the first 4k>
  Part 2
<Note: this is the second 4k>
  Part 3
<Note: this is the remainder of the original note>

Is this sort of thing even possible with XSLT?

Thank you for any information, clues, hints or suggestions.
If I can make this work for large Treepad files, I will post the solution on the Treepad website in the utilities section so other Treepad users who also use Shadow on the palm will be able to share data too.
I made a mistake!
I said:
Also note that the <Note: lines must appear on one line.

this really should say:
Also note that the <Note: lines must start in column 1 with no preceeding spaces. They can appear on one line, or multiple lines. The rules for Notes are:
1. start with <Note: with no preceeding spaces
2. end with the first ">" encountered - this means that the Treepad file cannot contain "<" or ">" characters.

Sorry for the oversight.