We help IT Professionals succeed at work.

Convert XML to CSV File

Johan85
Johan85 used Ask the Experts™
on
Hi,

I need to convert an XML file to CSV for an import into a websop. I tried different free tools, but I still don't get a good result.

Can anyone help me with that, the file is attached. Thanks in advance.
products.xml
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2010

Commented:
hello.
I've examined your file (products.xml) and i found that there is an incosistency at <![CDATA[Douchekop Rain 2½"] .
the quote (two and a half inches)  messes  your xml structure a lot .
 is there a way of getting a "cleaner" version of the file? (without styling in some records)
Information Architect
Top Expert 2006
Commented:
@om_prakash_p,
yes I agree that XSLT is the solution, don't expect people to learn XSLT from w3schools though
you could have tried to find at least an example and explain why XSLT is a good option

@sterium,
well, poor applications consider a " in the data an inconsistency, the problem why free convertors don't work is the weird structure though

@Johan85,

If standard transformers fail, it is indeed a good idea to take control of the output format yourself.
There are plenty of good reasons for this
1 the structure of the XML could make it not obvious for a transformer to grasp what to drill on one line
2 the importing application might have an incomplete CSV import filter
In both occasions, you need to trial and error to get things right
XSLT would be a good way to take control over this

I have implemented an XSLT assuming that the first is the problem.
For instance you have a deeply nested structure an the subcat seperation is a bit uncommon.
Looking at your data, I seem to understand that a productgroepen is in a subcategory as indicated in teh preceding-sibling::subcat[1]
Don't expect a standard transformer to figure that out for you
I would definitely model the XML data differently. This way, without an extra container is looking for trouble
Anyway, the below XSLT fixes this

The " in a datafield is not to be considered an "inconsistency"
There is no reason at all to avoid double quotes in data, just as there is no reason to avoid comma in data
So, here are some CSV rules
1. if there is a comma in the data field, surround the field with double quotes
2. if there is a double quote in a data field that double quotes as seperator, duplicate the double quote
There are many more rules, I just implemented the ones I found in your data

Having said that
123 should become 123
12,3 should become "12,3"
12"3 should become 12"3
1"2,3 should become "1""2,3"

All of this is implemented in the "process-single-field" template
duplicating teh " can only be done using a recursive template, have done that as well

Not all import filters (such as excell) implement all rules correctly, so you will need to experiment with a subset
If duplicating the " does not work, you will have to replace that with a different code, and replace it again with a script in the importing application.
It would be nice if I learned more about that application

I go from the assumption that you know how you execute an XSLT,
if not you can check the W3schools website mentioned above, or simply ask here

good luck
en nog een prettige namiddag

Geert
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="text"/>
    <xsl:variable name="quot"><xsl:text>"</xsl:text></xsl:variable>
    <xsl:template match="cats">
        <xsl:text>cat.titel,subcat.titel,productgroep.titel,product.id,nummer,omschrijving,prijs</xsl:text>
        <xsl:apply-templates select="//product"/>
    </xsl:template>
    <xsl:template match="product">
        <xsl:text>&#10;</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="ancestor::cat/titel"/>
        </xsl:call-template>
        <xsl:text>,</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="ancestor::productgroepen/preceding-sibling::subcat[1]"/>
        </xsl:call-template>
        <xsl:text>,</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="ancestor::productgroep/titel"/>
        </xsl:call-template>
        <xsl:text>,</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="@id"/>
        </xsl:call-template>
        <xsl:text>,</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="nummer"/>
        </xsl:call-template>
        <xsl:text>,</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="omschrijving"/>
        </xsl:call-template>
        <xsl:text>,</xsl:text>
        <xsl:call-template name="process-single-field">
            <xsl:with-param name="str" select="prijs"/>
        </xsl:call-template>
    </xsl:template>
    <xsl:template name="process-single-field">
        <xsl:param name="str"/>
        <xsl:choose>
            <xsl:when test="contains($str, ',')">
                <xsl:text>"</xsl:text>
                <xsl:call-template name="duplicate-quote">
                    <xsl:with-param name="str" select="normalize-space($str)"/>
                </xsl:call-template>
                <xsl:text>"</xsl:text>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="normalize-space($str)"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template name="duplicate-quote">
        <xsl:param name="str"/>
        <xsl:choose>
            <xsl:when test="contains($str, $quot)">
                <xsl:value-of select="substring-before($str, $quot)"/>
                <xsl:text>""</xsl:text>
                <xsl:call-template name="duplicate-quote">
                    <xsl:with-param name="str" select="substring-after($str, $quot)"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$str"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

Open in new window

Gertone (Geert Bormans)Information Architect
Top Expert 2006

Commented:
I have not checked the spec completely
http://tools.ietf.org/html/rfc4180
but there appears to be some confusion

some sources say that a " should always be escaped with ""
if your importing application expects that, I need to change the code slightly

have fun

Geert

Author

Commented:
Still had to figure out some things, but the tips helped me.