We help IT Professionals succeed at work.

CSV to XML transform and vice versa

daluu
daluu asked
on
2,744 Views
Last Modified: 2013-11-19
Hi, I'm a newbie to XML. I've found some resources for converting CSV to XML, and vice versa. But they seem to cover only one kind of XML - where the data is enclosed in tags (no data as tag attributes).

I'm wondering if there is a way to do a conversion from CSV to XML and XML to CSV in the following formats. For XML to CSV, I know you can use XSLT but I don't know how to adequately write the XSLT.

Sample XML Format (this is what I'm given):

<input OrderID=”1232” CustomerID=”XYZ”>
      <group id=1>
            <Item ProductID=121>
                  <OrderLineID>1</OrderLineID>
                  <Units>1</Units>
                  <NumOfSites>1</NumOfSites>
                  <Type>1</Type>
                  <Timeperiod>45</Timeperiod>
            </Item>
            <Item ProductID=122>
                  <OrderLineID>2</OrderLineID>
                  <Units>1</Units>
                  <NumOfSites>1</NumOfSites>
                  <Type>2</Type>
                  <Timeperiod>45</Timeperiod>
               </Item>
        </group>
      <group id=2>
            <Item ProductID=123>
                  <OrderLineID>3</OrderLineID>
                  <Units>1</Units>
                  <NumOfSites>1</NumOfSites>
                  <Type>1</Type>
                  <Timeperiod>45</Timeperiod>
                </Item>
            <Item ProductID=124>
                  <OrderLineID>3</OrderLineID>
                  <Units>1</Units>
                  <NumOfSites>1</NumOfSites>
                  <Type>1</Type>
                  <Timeperiod>45</Timeperiod>
               </Item>
         </group>
</input>

Sample CSV format (this format, I can specify/rearrange. Below is what I think is best in terms of CSV format).

productID, orderLineID, unit, numSites, type, period, orderID, customerID, groupID
121,1,1,1,45,1232,XYZ,1
122,2,1,1,2,45,1232,XYZ,1
123,3,1,1,1,45,1232,XYZ,2
124,3,1,1,1,45,1232,XYZ,2

How can I convert from one format to the other and vice versa, if possible? In terms of implementation, I need a command line tool. Tool could be in Java, C/C++, .NET/C#, VBScript, or Perl. Something customizable would be preferred so I can adapt it when changes occur, so I'd like to have the source code, not just the binary.
Comment
Watch Question

Information Architect
CERTIFIED EXPERT
Top Expert 2006
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
daluu,

I used this tool
http://a7soft-csv2xml.a7soft.qarchive.org/

to get this XML

<orders>
<order>
<productID>121</productID>
< orderLineID> 1</ orderLineID>
< unit> 1</ unit>
< numSites> 1</ numSites>
< type> 1</ type>
< period> 45</ period>
< orderID> 1232</ orderID>
< customerID> XYZ</ customerID>
< groupID> 1</ groupID>
</order>
<order>
<productID>122</productID>
< orderLineID> 2</ orderLineID>

and I used this XSLT to get back to what you need

note, this XSLT uses Muenchian grouping
explained here
http://www.jenitennison.com/xslt/grouping/muenchian.html

to learn XSLT (I recommend you do that)
go here
http://www.w3schools.com/xsl/default.asp

cheers

Geert
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
daluu,

I forgot to post the XSLT

just saw in your profile that you should be able to hack this together yourself in Perl, it is not hard
so I am not starting the effort in Ruby

The tool I suggested is a bit stupid since it also copies the spaces,
make sure that the seperation with a comma doesn't add a space, specially not in the title row

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output indent="yes"/>
    <xsl:key name="ord-by-grp" match="order" use="groupID"/>
    <xsl:key name="ord-by-ord" match="order" use="orderID"/>
    <xsl:template match="orders">
        <input>
            <xsl:for-each select="order[generate-id() = generate-id(key('ord-by-ord', orderID)[1])]">
                <xsl:variable name="thisOrder" select="orderID"/>
                <input OrderID="{orderID}" CustomerID="{customerID}">
                    <xsl:for-each select="../order[generate-id() = generate-id(key('ord-by-grp', groupID)[orderID = $thisOrder][1])]">
                        <group id="{groupID}">
                            <xsl:for-each select="key('ord-by-grp', groupID)[orderID = $thisOrder]">
                                <Item ProductID="{productID}">
                                    <OrderLineID><xsl:value-of select="orderLineID"/></OrderLineID>
                                    <Units><xsl:value-of select="unit"/></Units>
                                    <NumOfSites><xsl:value-of select="numSites"/></NumOfSites>
                                    <Type><xsl:value-of select="type"/></Type>
                                    <Timeperiod><xsl:value-of select="period"/></Timeperiod>
                                </Item>
                                </xsl:for-each>
                        </group>
                    </xsl:for-each>
                </input>
            </xsl:for-each>
        </input>
    </xsl:template>
</xsl:stylesheet>

Geert

Author

Commented:
Thanks for the info Gertone. I'll try them out and see if it works well for me.

Author

Commented:
Do you know if there's any free or open source XSLT XML transformation tools? I'd rather avoid having to buy something (and install something temporarily as a trial).
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
There are plenty
I would go for Saxon
(please note that only the Schema aware XSLT2.0 is for purchase: Saxon8SA,
the XSLT1 Saxon6.5 and the XSLT2.0 not schema-aware Saxon8B are free of charge)
http://saxon.sourceforge.net/
and
http://saxon.sourceforge.net/#F6.5.5

Xalan from the Apache project is opensource
http://xalan.apache.org/

and microsoft has a commandline wrapper on top of msxml4
http://www.microsoft.com/downloads/details.aspx?FamilyId=2FB55371-C94E-4373-B0E9-DB4816552E41&displaylang=en

cheers

Geert

Author

Commented:
Thanks, I'll take a look at those tools. I had thought that the saxon tool wasn't free, at least at the Saxonica site.
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
welcome,
Saxon is definitely free, accept the full featured version (which I rarely use)
have fun with it

Geert
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
accept should be except of course

Author

Commented:
Thanks for the help so far. The XML to CSV conversion works well. But  CSV to XML (and then XML cleanup via XSL) has a minor issue. It seems to leave an extra pair of input tags (sorta serving as the root tags). This could be fixed easily but extra work.

Unless I did something wrong, can you figure out what part needs tweaking in the XSL file.

I'm using the Saxon B 8.8 .NET version tool (transform.exe) and a Perl CSV to XML converter and here is what I get:

CSV to XML output (from Perl):
<orders>
      <order>
            <productID>121</productID>
            <orderLineID>1</orderLineID>
            <unit>1</unit>
            <numSites>1</numSites>
            <type>1</type>
            <period>45</period>
            <orderID>1232</orderID>
            <customerID>XYZ</customerID>
            <groupID>1</groupID>
      </order>
      <order>
            <productID>122</productID>
            <orderLineID>2</orderLineID>
            <unit>1</unit>
            <numSites>1</numSites>
            <type>2</type>
            <period>45</period>
            <orderID>1232</orderID>
            <customerID>XYZ</customerID>
            <groupID>1</groupID>
      </order>
      <order>
            <productID>123</productID>
            <orderLineID>3</orderLineID>
            <unit>1</unit>
            <numSites>1</numSites>
            <type>1</type>
            <period>45</period>
            <orderID>1232</orderID>
            <customerID>XYZ</customerID>
            <groupID>2</groupID>
      </order>
      <order>
            <productID>124</productID>
            <orderLineID>3</orderLineID>
            <unit>1</unit>
            <numSites>1</numSites>
            <type>1</type>
            <period>45</period>
            <orderID>1232</orderID>
            <customerID>XYZ</customerID>
            <groupID>2</groupID>
      </order>
</orders>

XML cleanup via the XSL you posted:
C:\WorkTemp\saxon\bin>Transform.exe licTestOut.xml licTestXMLFixFromCSV.xsl
Warning: at xsl:stylesheet on line 2 of file:///C:/WorkTemp/saxon/bin/licTestXML
FixFromCSV.xsl:
  Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
<?xml version="1.0" encoding="UTF-8"?>
<input>
   <input OrderID="1232" CustomerID="XYZ">
      <group id="1">
         <Item ProductID="121">
            <OrderLineID>1</OrderLineID>
            <Units>1</Units>
            <NumOfSites>1</NumOfSites>
            <Type>1</Type>
            <Timeperiod>45</Timeperiod>
         </Item>
         <Item ProductID="122">
            <OrderLineID>2</OrderLineID>
            <Units>1</Units>
            <NumOfSites>1</NumOfSites>
            <Type>2</Type>
            <Timeperiod>45</Timeperiod>
         </Item>
      </group>
      <group id="2">
         <Item ProductID="123">
            <OrderLineID>3</OrderLineID>
            <Units>1</Units>
            <NumOfSites>1</NumOfSites>
            <Type>1</Type>
            <Timeperiod>45</Timeperiod>
         </Item>
         <Item ProductID="124">
            <OrderLineID>3</OrderLineID>
            <Units>1</Units>
            <NumOfSites>1</NumOfSites>
            <Type>1</Type>
            <Timeperiod>45</Timeperiod>
         </Item>
      </group>
   </input>
</input>
C:\WorkTemp\saxon\bin>
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
you can leave the literal tag <input> and its corresponding endtag
out of the template for orders

   <xsl:template match="orders">
        <input>
            <xsl:for-each select="order[generate-id() = generate-id(key('ord-by-ord', orderID)[1])]">
should become

   <xsl:template match="orders">
            <xsl:for-each select="order[generate-id() = generate-id(key('ord-by-ord', orderID)[1])]">

I did this to avoid that multiple input elements in a row would make the endresult not wellformed
but if you don't need them, delete the node

cheers

Geert

Author

Commented:
Ok, thanks for all the help. Now I'll need to study up XSLT.
Gertone (Geert Bormans)Information Architect
CERTIFIED EXPERT
Top Expert 2006

Commented:
welcome
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.