Scott Abraham
asked on
Python ElementTree xml output to csv
I have the following XML file ('registerreads_EE.xml'):
<?xml version="1.0" encoding="us-ascii" standalone="yes"?>
<ReadingDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ReadingStatusRefTable>
<ReadingStatusRef Ref="1">
<UnencodedStatus SourceValidation="SIMPLE">
<StatusCodes>
<Signal>XX</Signal>
</StatusCodes>
</UnencodedStatus>
</ReadingStatusRef>
</ReadingStatusRefTable>
<Header>
<IEE_System Id="XXXXXXXXXXXXXXX" />
<Creation_Datetime Datetime="2015-10-22T09:05 :32Z" />
<Timezone Id="UTC" />
<Path FilePath="X:\XXXXXXXXXXXX. xml" />
<Export_Template Id="XXXXX" />
<CorrelationID Id="" />
</Header>
<ImportExportParameters ResubmitFile="false" CreateGroup="true">
<DataFormat TimestampType="XXXXXX" Type="XXXX" />
</ImportExportParameters>
<Channels>
<Channel StartDate="2015-10-21T00:0 0:00-05:00 " EndDate="2015-10-22T00:00: 00-05:00">
<ChannelID ServicePointChannelID="738 25603:301" />
<Readings>
<Reading Value="3577.0" ReadingTime="2015-10-21T00 :00:00-05: 00" StatusRef="1" />
<Reading Value="3601.3" ReadingTime="2015-10-22T00 :00:00-05: 00" StatusRef="1" />
</Readings>
<ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825603" RequestSource="Scheduled" />
</Channel>
<Channel StartDate="2015-10-21T00:0 0:00-05:00 " EndDate="2015-10-22T00:00: 00-05:00">
<ChannelID ServicePointChannelID="738 25604:301" />
<Readings>
<Reading Value="3462.5" ReadingTime="2015-10-21T00 :00:00-05: 00" StatusRef="1" />
<Reading Value="3501.5" ReadingTime="2015-10-22T00 :00:00-05: 00" StatusRef="1" />
</Readings>
<ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825604" RequestSource="Scheduled" />
</Channel>
</Channels>
</ReadingDocument>
I want to parse the XML of the channel data into a csv file.
He is what I have written in Python 2.7.10:
import xml.etree.ElementTree as ET
tree = ET.parse('registerreads_EE .xml')
root = tree.getroot()[3]
for channel in tree.iter('Channel'):
for exportrequest in channel.iter('ExportReques t'):
entityid = exportrequest.attrib.get(' EntityID')
for meterread in channel.iter('Reading'):
read = meterread.attrib.get('Valu e')
date = meterread.attrib.get('Read ingTime')
print read[:-2],",",date[:10],", ",entityid
tree.write(open('registerr eads_EE.cs v','w'))
Here is the screen output when the above is run:
3577 , 2015-10-21 , 73825603
3601 , 2015-10-22 , 73825603
3462 , 2015-10-21 , 73825604
3501 , 2015-10-22 , 73825604
The 'registerreads.csv' output file is like the original XML file, minus the first line.
I would like the printed output above outputted to a csv file with headers of read, date, entityid.
I am having difficulty with this. This is my first python program. Any help is appreciated.
<?xml version="1.0" encoding="us-ascii" standalone="yes"?>
<ReadingDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ReadingStatusRefTable>
<ReadingStatusRef Ref="1">
<UnencodedStatus SourceValidation="SIMPLE">
<StatusCodes>
<Signal>XX</Signal>
</StatusCodes>
</UnencodedStatus>
</ReadingStatusRef>
</ReadingStatusRefTable>
<Header>
<IEE_System Id="XXXXXXXXXXXXXXX" />
<Creation_Datetime Datetime="2015-10-22T09:05
<Timezone Id="UTC" />
<Path FilePath="X:\XXXXXXXXXXXX.
<Export_Template Id="XXXXX" />
<CorrelationID Id="" />
</Header>
<ImportExportParameters ResubmitFile="false" CreateGroup="true">
<DataFormat TimestampType="XXXXXX" Type="XXXX" />
</ImportExportParameters>
<Channels>
<Channel StartDate="2015-10-21T00:0
<ChannelID ServicePointChannelID="738
<Readings>
<Reading Value="3577.0" ReadingTime="2015-10-21T00
<Reading Value="3601.3" ReadingTime="2015-10-22T00
</Readings>
<ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825603" RequestSource="Scheduled" />
</Channel>
<Channel StartDate="2015-10-21T00:0
<ChannelID ServicePointChannelID="738
<Readings>
<Reading Value="3462.5" ReadingTime="2015-10-21T00
<Reading Value="3501.5" ReadingTime="2015-10-22T00
</Readings>
<ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825604" RequestSource="Scheduled" />
</Channel>
</Channels>
</ReadingDocument>
I want to parse the XML of the channel data into a csv file.
He is what I have written in Python 2.7.10:
import xml.etree.ElementTree as ET
tree = ET.parse('registerreads_EE
root = tree.getroot()[3]
for channel in tree.iter('Channel'):
for exportrequest in channel.iter('ExportReques
entityid = exportrequest.attrib.get('
for meterread in channel.iter('Reading'):
read = meterread.attrib.get('Valu
date = meterread.attrib.get('Read
print read[:-2],",",date[:10],",
tree.write(open('registerr
Here is the screen output when the above is run:
3577 , 2015-10-21 , 73825603
3601 , 2015-10-22 , 73825603
3462 , 2015-10-21 , 73825604
3501 , 2015-10-22 , 73825604
The 'registerreads.csv' output file is like the original XML file, minus the first line.
I would like the printed output above outputted to a csv file with headers of read, date, entityid.
I am having difficulty with this. This is my first python program. Any help is appreciated.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I suggest to modify further the extraction of the data using the simple XPath expressions:
So, find all Channel elements. Inside of each, extract the info from the ExportRequest element (no loop needed here). And inside the Channel, find all 'Readings/Reading' elements (explicit XPath again), get their attributes, form the row, and write it to the screen and to the csv file.
reading.attrib['Value'] can also be used instead of reading.attrib.get('Value' ), but the later is better if the attribute may be missing (None is returned by .get() in the case).
#!python2
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('registerreads_EE.xml')
root = tree.getroot()
with open('registerreads_EE.csv', 'wb') as fout:
writer = csv.writer(fout)
row = ['read', 'date', 'entityid']
writer.writerow(row)
for channel in root.findall('.//Channel'):
exportrequest = channel.find('ExportRequest')
entityid = exportrequest.attrib.get('EntityID')
for reading in channel.findall('Readings/Reading'):
value = reading.attrib.get('Value')
reading_time = reading.attrib.get('ReadingTime')
row = [value[:-2], reading_time[:10], entityid]
print ', '.join(row)
writer.writerow(row)
If you name it root it should not be the fourth element from the XML structure. The expression './/Channel' means: from this element ('.'), wherever in the structure ('//'), the Channel element. You can also be more explicit and write 'Channels/Channel' instead. You can also write './Channels/Channel' (with a single slash after the dot). It means that the Channel elements are searched only as direct children of the Channels element, and that one has to be the direct descendant of the root element.So, find all Channel elements. Inside of each, extract the info from the ExportRequest element (no loop needed here). And inside the Channel, find all 'Readings/Reading' elements (explicit XPath again), get their attributes, form the row, and write it to the screen and to the csv file.
reading.attrib['Value'] can also be used instead of reading.attrib.get('Value'
ASKER
Out of curiosity, if I wanted to open the xml file from c:\export and save it to i:\import, how would I do that? Right now I just have the script and the files residing under the scripts directory c:\python27\scripts
You just pass the file name with the path. The bare file name is a relative path to the file at the current working directory. (That is, the 'registerreads_EE.xml' is equal to './registerreads_EE.xml'.) The recommended approach is to store the file name to the variable, and use the variable when opening the file or for reading or for writing.
Just be careful with backslashes. Or they have to be doubled (error prone), or you can use the normal slash (as in Unix -- recommended), or use a raw string literal with backslashes (error prone), or use the os.path.join() function to put the parts of the path together (also recommended). However, you have to use at least one slash after the disk name (if I recall correctly):
Just be careful with backslashes. Or they have to be doubled (error prone), or you can use the normal slash (as in Unix -- recommended), or use a raw string literal with backslashes (error prone), or use the os.path.join() function to put the parts of the path together (also recommended). However, you have to use at least one slash after the disk name (if I recall correctly):
...
fname_in = 'i:/import/registerreads_EE.xml'
fname_out = os.path.join('i:/', 'export', 'anothersubdir', 'registerreads_EE.csv')
fname = r'x:\path\as\raw\string\literal.txt'
ASKER