Nishanth kumar
asked on
how to convert xml to csv in python
input xml can be any. It is dynamic file. so there is no pre determined tags. it should be created depending on the input xml file.
And i need the code for following requierments.
Parse XML file to create a tree of objects
- create an object for each xml element
-object mainly contains xml element name, value, attributes, list of sub elements where each member in the list is another xml element object
- a function to create object tree
- a function to iterate
And i need the code for following requierments.
Parse XML file to create a tree of objects
- create an object for each xml element
-object mainly contains xml element name, value, attributes, list of sub elements where each member in the list is another xml element object
- a function to create object tree
- a function to iterate
ASKER
in python language. xml structure may vary. some attributes may add and some times attributes may not be there. Sub tags would be some times.
How generic is the XML source? You might have optional elements and attributes yes, but does that mean there is absolutely no "standard" format that you are attempting to process at all? Can we define what the XML looks like? Are there no XML schema's at all?
So you expect to process both of these structures:
So you expect to process both of these structures:
<hierachy>
<att>
<Order>1</Order>
<attval>Data</attval>
<children>
<att>
<Order>1</Order>
<attval>Studyval</attval>
</att>
<att>
<Order>2</Order>
<attval>Site</attval>
</att>
</children>
</att>
<att>
<Order>2</Order>
<attval>Info</attval>
<children>
<att>
<Order>1</Order>
<attval>age</attval>
</att>
<att>
<Order>2</Order>
<attval>gender</attval>
</att>
</children>
</att>
</hierachy>
as well as
<Table>
<Product>
<Product_id>1</Product_id>
<Product_name>Product 1</Product_name>
<Product_price>1000</Product_price>
</Product>
<Product>
<Product_id>2</Product_id>
<Product_name>Product 2</Product_name>
<Product_price>2000</Product_price>
</Product>
<Product>
<Product_id>3</Product_id>
<Product_name>Product 3</Product_name>
<Product_price>3000</Product_price>
</Product>
<Product>
<Product_id>4</Product_id>
<Product_name>Product 4</Product_name>
<Product_price>4000</Product_price>
</Product>
</Table>
How will you know that there are the missing attributes when you have no XML schema at all? I hope you catch my drift... you need to provide the XML formats you are working with, unless you really want to be able to process any XML format under the sun
Untested: basically something like
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="yes" encoding="utf-8"/>
<xsl:strip-space elements="*" />
<xsl:template match="/hierarchy/att">
<xsl:text><xsl:value-of select="."/><xsl:text/>
</xsl:template>
</xsl:stylesheet>
ASKER
i will make it simple. I have been asked to do the following steps. i couldnt understand what it is . i need the solution for following steps.
step 1: create an object for each xml element.object mainly contains xml element name, value, attributes, list of sub elements where each member in the list is another xml element object
step 2: a function to create object tree
step 3: a function to iterate.
how do i proceed ?
step 1: create an object for each xml element.object mainly contains xml element name, value, attributes, list of sub elements where each member in the list is another xml element object
step 2: a function to create object tree
step 3: a function to iterate.
how do i proceed ?
ASKER
Parse XML file to create a tree of objects
Can you post a sample of the XML file?
In the absence of a sample, maybe this might help to point you in a direction. SimpleXMLParse, elementree, and BeautifulSoup are some of the common XML parsers for python.
SimpleXMLParse
http://www.evanjones.ca/software/simplexmlparse.html
elementree
http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree
https://docs.python.org/2/library/xml.etree.elementtree.html
Also: http://lxml.de/objectify.html
In the absence of a sample, maybe this might help to point you in a direction. SimpleXMLParse, elementree, and BeautifulSoup are some of the common XML parsers for python.
SimpleXMLParse
http://www.evanjones.ca/software/simplexmlparse.html
elementree
http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree
https://docs.python.org/2/library/xml.etree.elementtree.html
Also: http://lxml.de/objectify.html
ASKER
for example. let this be the xml. And desired output is. it should be in a table format. personid as row and name as col in table. it may have child nodes or it may not. if child node is there. it should also get populated.
<root> - file name
<person id="01"> - table name/ a row in table
<name> abc</name> - a col of the table, and value is the content
<age>32</age>
<address>addr123</address>
<siblings>
<name></name>
<name></name>
</siblings>
</person>
<person id="02">
<name> def</name>
<age>44</age>
<address>addr456</address>
<siblings>
<name></name>
<name></name>
<name></name>
</siblings>
</person>
</root>
<root> - file name
<person id="01"> - table name/ a row in table
<name> abc</name> - a col of the table, and value is the content
<age>32</age>
<address>addr123</address>
<siblings>
<name></name>
<name></name>
</siblings>
</person>
<person id="02">
<name> def</name>
<age>44</age>
<address>addr456</address>
<siblings>
<name></name>
<name></name>
<name></name>
</siblings>
</person>
</root>
ASKER
if user needs personid 1 attributes. it should get populated in table format. If user requires personid 2 then its respective values should be populated.
ASKER
DOM parsing would be good
ASKER
-------my code-----
import xml.dom
import xml.dom.minidom
doc = xml.dom.minidom.parseStrin g('''
<root>
<person id="01">
<name> abc</name>
<age>32</age>
<address>addr123</address>
<siblings>
<name></name>
<name></name>
</siblings>
</person>
<person id="02">
<name> def</name>
<age>44</age>
<address>addr456</address>
<siblings>
<name></name>
<name></name>
<name></name>
</siblings>
</person>
</root>
''')
def innerHtml(root):
text = ''
nodes = [ root ]
while not nodes==[]:
node = nodes.pop()
if node.nodeType==xml.dom.Nod e.TEXT_NOD E:
text += node.wholeText
else:
nodes.extend(node.childNod es)
return text
for statusNode in doc.getElementsByTagName(' person'):
for childNode in statusNode.childNodes:
if childNode.nodeType==xml.do m.Node.ELE MENT_NODE:
print("{}={}".format(child Node.nodeN ame, innerHtml(childNode)))
-------------output i got is------------------
name= abc
age=32
address=addr123
siblings=
name= def
age=44
address=addr456
siblings=
---------but expected output should be in tableformat-----
i need to get the person id attribute also...pls answer
person id name age address
01 abc 32 addr123
02 def 44 addr456
import xml.dom
import xml.dom.minidom
doc = xml.dom.minidom.parseStrin
<root>
<person id="01">
<name> abc</name>
<age>32</age>
<address>addr123</address>
<siblings>
<name></name>
<name></name>
</siblings>
</person>
<person id="02">
<name> def</name>
<age>44</age>
<address>addr456</address>
<siblings>
<name></name>
<name></name>
<name></name>
</siblings>
</person>
</root>
''')
def innerHtml(root):
text = ''
nodes = [ root ]
while not nodes==[]:
node = nodes.pop()
if node.nodeType==xml.dom.Nod
text += node.wholeText
else:
nodes.extend(node.childNod
return text
for statusNode in doc.getElementsByTagName('
for childNode in statusNode.childNodes:
if childNode.nodeType==xml.do
print("{}={}".format(child
-------------output i got is------------------
name= abc
age=32
address=addr123
siblings=
name= def
age=44
address=addr456
siblings=
---------but expected output should be in tableformat-----
i need to get the person id attribute also...pls answer
person id name age address
01 abc 32 addr123
02 def 44 addr456
This question needs an answer!
Become an EE member today
7 DAY FREE TRIALMembers can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Converting XML (a heirarchical structure) to CSV (a flat 2 dimenational structure) is not easily done unless the structure of the XML file is known beforehand.