Molko
asked on
XSLT - Applying an XSLT aginst and HTML document to produce XML output
Hi
Is there a recommened approach to applying an XSLT transformation against an HTML document ?
I would like to take an HTML input and produce some XML output.
I am currently using the SAXON XSLT processor in an java app.
Is SAXON the way to go ? Or are there any better approches I could take ?
Is there a recommened approach to applying an XSLT transformation against an HTML document ?
I would like to take an HTML input and produce some XML output.
I am currently using the SAXON XSLT processor in an java app.
Is SAXON the way to go ? Or are there any better approches I could take ?
ASKER
Hi
Yes, the source is HTML and not XHTML.
Could you provide me a simple example of the XSLT2 grouping ? Very much appreciated.
Thanks
Yes, the source is HTML and not XHTML.
Could you provide me a simple example of the XSLT2 grouping ? Very much appreciated.
Thanks
Perhpas, this woul be an example of XSLT 2 grouping:
http://stackoverflow.com/questions/2177927/grouping-several-groups-in-xslt-2
http://stackoverflow.com/questions/2177927/grouping-several-groups-in-xslt-2
well, that is a good example (and simple to grasp) for one level grouping.
it is getting really complex and harder to swallow if you want to do this up to six levels.
(I once did one in XSLT1, and it required multiple steps, never managed to get it working properly for other than straightforward examples in a single step,
so allthough complex, my statement for XSLT2 holds)
I have a stylesheet I always use, but have not done that myself, so would like to pass the reference to you, not the stylesheet to give the author credit
(you could google for it "nesting html with XSLT2 grouping" or something like that for the xsl biglist(mullberry tech)
I will try to find the reference later tonight
it is getting really complex and harder to swallow if you want to do this up to six levels.
(I once did one in XSLT1, and it required multiple steps, never managed to get it working properly for other than straightforward examples in a single step,
so allthough complex, my statement for XSLT2 holds)
I have a stylesheet I always use, but have not done that myself, so would like to pass the reference to you, not the stylesheet to give the author credit
(you could google for it "nesting html with XSLT2 grouping" or something like that for the xsl biglist(mullberry tech)
I will try to find the reference later tonight
Perhaps you already did it, but anyway
I combined the code from here:
http://blog.msbbc.co.uk/2007/06/simple-saxon-java-example.html
downloaded saxonb9-1-0-8j.zip from here
http://sourceforge.net/projects/saxon/files/Saxon-B/9.1.0.8/saxonb9-1-0-8j.zip/download
and expanded it and placed saxon9.jar on the classpath
and used input files from the above link.
http://stackoverflow.com/questions/2177927/grouping-several-groups-in-xslt-2
And it worked exactly as stated there
This is the code:
input.xml
input.xsl:
Output:
I combined the code from here:
http://blog.msbbc.co.uk/2007/06/simple-saxon-java-example.html
downloaded saxonb9-1-0-8j.zip from here
http://sourceforge.net/projects/saxon/files/Saxon-B/9.1.0.8/saxonb9-1-0-8j.zip/download
and expanded it and placed saxon9.jar on the classpath
and used input files from the above link.
http://stackoverflow.com/questions/2177927/grouping-several-groups-in-xslt-2
And it worked exactly as stated there
This is the code:
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
public class SimpleSaxon {
public static void myTransformer (String sourceID, String xslID)
throws TransformerException, TransformerConfigurationException {
// Create a transform factory instance.
TransformerFactory tfactory = TransformerFactory.newInstance();
// Create a transformer for the stylesheet.
Transformer transformer = tfactory.newTransformer(new StreamSource(new File(xslID)));
// Transform the source XML to System.out.
transformer.transform(new StreamSource(new File(sourceID)),
new StreamResult(System.out));
}
public static void main(String args[]) {
// set the TransformFactory to use the Saxon TransformerFactoryImpl method
System.setProperty("javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl");
String foo_xml = "input.xml"; //input xml
String foo_xsl = "input.xsl"; //input xsl
try {
myTransformer (foo_xml, foo_xsl);
} catch (Exception ex) {
handleException(ex);
}
}
private static void handleException(Exception ex) {
System.out.println("EXCEPTION: " + ex);
ex.printStackTrace();
}
}
input.xml
<article>
<h1>A section title here</h1>
<p>A paragraph.</p>
<p>Another paragraph.</p>
<bl>Bulleted list item.</bl>
<bl>Another bulleted list item.</bl>
<h1>Another section title</h1>
<p>Yet another paragraph.</p>
</article>
input.xsl:
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:template match="article">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="h1">
<sec>
<xsl:copy-of select="."/>
<xsl:for-each-group select="current-group() except ." group-adjacent="boolean(self::bl)">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<list>
<xsl:apply-templates select="current-group()"/>
</list>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</sec>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="bl">
<list-item>
<xsl:apply-templates/>
</list-item>
</xsl:template>
</xsl:stylesheet>
Output:
<?xml version="1.0" encoding="UTF-8"?>
<article>
<sec>
<h1>A section title here</h1>
<p>A paragraph.</p>
<p>Another paragraph.</p>
<list>
<list-item>Bulleted list item.</list-item>
<list-item>Another bulleted list item.</list-item>
</list>
</sec>
<sec>
<h1>Another section title</h1>
<p>Yet another paragraph.</p>
</sec>
</article>
my friend, what I meant was more than one level.
It is easy with one level,
h1 - h1 - h1
and it is easy with predictable levels (see michael kays excellent reference book)
h1 - h2 - h2 - h1 - h 2 - h3 - h2 - h3 - h1
it is somewhat harder with unpredictable levels (such as most html out there)
h1 - h4 - h2 - h1 - h4 - h3 - h2 - h1- h4 - h3
please read my comments more carefully :-)
It is easy with one level,
h1 - h1 - h1
and it is easy with predictable levels (see michael kays excellent reference book)
h1 - h2 - h2 - h1 - h 2 - h3 - h2 - h3 - h1
it is somewhat harder with unpredictable levels (such as most html out there)
h1 - h4 - h2 - h1 - h4 - h3 - h2 - h1- h4 - h3
please read my comments more carefully :-)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
And here is the code I use for nesting flat structures
http://stackoverflow.com/questions/2108348/xslt-deepening-content-structure
the answer from martin honnen is what you are looking for
... somewhat advanced stuff, so you will need to take some time to swallow and adapt it
cheers
Geert
http://stackoverflow.com/questions/2108348/xslt-deepening-content-structure
the answer from martin honnen is what you are looking for
... somewhat advanced stuff, so you will need to take some time to swallow and adapt it
cheers
Geert
ASKER
Thanks
welcome
Transforming HTML into XML usually implies that you need to add nested grouping from flat list
H1 H2 H1
to become
section{H1 {section{H2} ...} section{H1 ...}
For that XSLT2 grouping facilities come in very handy.
So indeed, I recommend using Saxon and XSLT2
Note that XSLT requires wellformed XML as its input format.
If you are processing HTML instead of XHTML as a source,
you will need to run TagSoup or HTMLTidy to parse the HTML before you can send it to XSLT