The XSLT would be really simple to get this done,
but I don't think that XSLT is the right answer with these filesizes.
You definitely need a streaming parser for this
You could look at STX
http://stx.sourceforge.net
which is a forward only subset of XSLT, but that works with an extremely low footprint
Have a look first whether or not this would fit your architecture
I can then help you with the STX, whch would be as simple as the XSLT required for this
If you have some Perl knowledge XML::STX would work fine
There is a Streaming PullParser for Java, StAX,
you could use this for cutting your file in pieces (of 50MB eg.)
create the pipe-CSV for each piece using XSLT
and bring the result together after XSLT-ing
I am not a Java programmer, so I can't help you with that
Here is a nice introduction to StAX if you are a java programmer
http://www.ibm.com/develop
The technique is know as pull parsing, if you are a Python programmer you could use pulldom
http://www.ibm.com/develop
Uche has a more recent article where he has such a splitter in the middle of the article
http://www.ibm.com/develop
cheers
Geert
Main Topics
Browse All Topics





by: abelPosted on 2007-09-17 at 10:34:23ID: 19906666
Hi,
rg/1999/XS L/Transfor m" version="2.0">
m_ean|itam _upc" separator="|" />
It is fairly easy to do what you want in XSLT, but the size maybe a problem. The only product I know of that can deal with large sizes like your is Saxon-SA, and then, still only when you construct your stylesheet ready for streaming processing (SAX based processing).
You XSLT may look the following
<xsl:stylesheet
xmlns:xsl="http://www.w3.o
<!-- tell the processor to output text -->
<xsl:output method="text" />
<!-- throw away unwanted nodes -->
<xsl:template match="*" />
<!-- here you can make one long list of all items you want in the CSV row -->
<xsl:template match="item_basic_data">
<xsl:value-of select="item_unique_id|ite
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Cheers,
-- Abel Braaksma