Link to home
Start Free TrialLog in
Avatar of Nishanth kumar
Nishanth kumar

asked on

how to convert xml to csv in python

input xml can be any. It is dynamic file. so there is no pre determined tags. it should be created depending on the input xml file.

And i need the code for following requierments.

Parse XML file to create a tree of objects
   - create an object for each xml element
   -object mainly contains xml element name, value, attributes, list of sub elements where each member in the list is another xml element object
   - a function to create object tree
  - a function to iterate
Avatar of Mlanda T
Mlanda T
Flag of South Africa image

In what language is this to be done?

Converting XML (a heirarchical structure) to CSV (a flat 2 dimenational structure) is not easily done unless the structure of the XML file is known beforehand.
Avatar of Nishanth kumar
Nishanth kumar

ASKER

in python language. xml structure may vary. some attributes may add and some times attributes may not be there. Sub tags would be some times.
How generic is the XML source? You might have optional elements and attributes yes, but does that mean there is absolutely no "standard" format that you are attempting to process at all? Can we define what the XML looks like? Are there no XML schema's at all?

So you expect to process both of these structures:
<hierachy>
    <att>
        <Order>1</Order>
        <attval>Data</attval>
        <children>
            <att>
                <Order>1</Order>
                <attval>Studyval</attval>
            </att>
            <att>
                <Order>2</Order>
                <attval>Site</attval>
            </att>
        </children>
    </att>
    <att>
        <Order>2</Order>
        <attval>Info</attval>
        <children>
            <att>
                <Order>1</Order>
                <attval>age</attval>
            </att>
            <att>
                <Order>2</Order>
                <attval>gender</attval>
            </att>
        </children>
    </att>
</hierachy>

Open in new window

as well as
<Table>
<Product>
<Product_id>1</Product_id>
<Product_name>Product 1</Product_name>
<Product_price>1000</Product_price>
</Product>
<Product>
<Product_id>2</Product_id>
<Product_name>Product 2</Product_name>
<Product_price>2000</Product_price>
</Product>
<Product>
<Product_id>3</Product_id>
<Product_name>Product 3</Product_name>
<Product_price>3000</Product_price>
</Product>
<Product>
<Product_id>4</Product_id>
<Product_name>Product 4</Product_name>
<Product_price>4000</Product_price>
</Product>
</Table>

Open in new window

How will you know that there are the missing attributes when you have no XML schema at all? I hope you catch my drift... you need to provide the XML formats you are working with, unless you really want to be able to process any XML format under the sun
Untested: basically something like

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="text"  omit-xml-declaration="yes" encoding="utf-8"/>
	<xsl:strip-space elements="*" />	
	<xsl:template match="/hierarchy/att">		
			<xsl:text><xsl:value-of select="."/><xsl:text/>			
	</xsl:template>
</xsl:stylesheet>

Open in new window

i will make it simple. I have been asked to do the following steps. i couldnt understand what it is . i need the solution for following steps.

step 1: create an object for each xml element.object mainly contains xml element name, value, attributes, list of sub elements where each member in the list is another xml element object

step 2: a function to create object tree

step 3: a function to iterate.

how do i proceed ?
Parse XML file to create a tree of objects
Can you post a sample of the XML file?

In the absence of a sample, maybe this might help to point you in a direction. SimpleXMLParse, elementree, and BeautifulSoup are some of the common XML parsers for python.

SimpleXMLParse
http://www.evanjones.ca/software/simplexmlparse.html

elementree
http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree
https://docs.python.org/2/library/xml.etree.elementtree.html

Also: http://lxml.de/objectify.html
for example. let this be the xml. And desired output is. it should be in a table format. personid as row and name as col in table. it may have child nodes or it may not. if child node is there. it should also get populated.

<root> - file name
   <person id="01"> - table name/ a row in table
      <name> abc</name> - a col of the table, and value is the content
      <age>32</age>
      <address>addr123</address>
      <siblings>
        <name></name>
        <name></name>
      </siblings>
   </person>
   <person id="02">
      <name> def</name>
      <age>44</age>
      <address>addr456</address>
      <siblings>
        <name></name>
        <name></name>
        <name></name>
      </siblings>
   </person>
</root>
if user needs personid 1 attributes. it should get populated in table format.  If user requires personid 2 then its respective values should be populated.
DOM parsing would be good
-------my code-----

import xml.dom
import xml.dom.minidom
doc = xml.dom.minidom.parseString('''
<root>
   <person id="01">
      <name> abc</name>
      <age>32</age>
      <address>addr123</address>
      <siblings>
        <name></name>
        <name></name>
      </siblings>
   </person>
   <person id="02">
      <name> def</name>
      <age>44</age>
      <address>addr456</address>
      <siblings>
        <name></name>
        <name></name>
        <name></name>
      </siblings>
   </person>
</root>

''')


def innerHtml(root):
    text = ''
    nodes = [ root ]
    while not nodes==[]:
        node = nodes.pop()
        if node.nodeType==xml.dom.Node.TEXT_NODE:
            text += node.wholeText
        else:
            nodes.extend(node.childNodes)
    return text
for statusNode in doc.getElementsByTagName('person'):
    for childNode in statusNode.childNodes:
        if childNode.nodeType==xml.dom.Node.ELEMENT_NODE:
            print("{}={}".format(childNode.nodeName, innerHtml(childNode)))

-------------output i got is------------------
name= abc
age=32
address=addr123
siblings=
                     
name= def
age=44
address=addr456
siblings=

---------but expected output should be in tableformat-----
i need to get the person id attribute also...pls answer

person id         name         age      address
01                abc          32      addr123
02                def          44       addr456
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.