Link to home
Start Free TrialLog in
Avatar of Scott Abraham
Scott AbrahamFlag for United States of America

asked on

Python ElementTree xml output to csv

I have the following XML file ('registerreads_EE.xml'):

<?xml version="1.0" encoding="us-ascii" standalone="yes"?>
<ReadingDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <ReadingStatusRefTable>
    <ReadingStatusRef Ref="1">
      <UnencodedStatus SourceValidation="SIMPLE">
        <StatusCodes>
          <Signal>XX</Signal>
        </StatusCodes>
      </UnencodedStatus>
    </ReadingStatusRef>
  </ReadingStatusRefTable>
  <Header>
    <IEE_System Id="XXXXXXXXXXXXXXX" />
    <Creation_Datetime Datetime="2015-10-22T09:05:32Z" />
    <Timezone Id="UTC" />
    <Path FilePath="X:\XXXXXXXXXXXX.xml" />
    <Export_Template Id="XXXXX" />
    <CorrelationID Id="" />
  </Header>
  <ImportExportParameters ResubmitFile="false" CreateGroup="true">
    <DataFormat TimestampType="XXXXXX" Type="XXXX" />
  </ImportExportParameters>
  <Channels>
    <Channel StartDate="2015-10-21T00:00:00-05:00" EndDate="2015-10-22T00:00:00-05:00">
      <ChannelID ServicePointChannelID="73825603:301" />
      <Readings>
        <Reading Value="3577.0" ReadingTime="2015-10-21T00:00:00-05:00" StatusRef="1" />
        <Reading Value="3601.3" ReadingTime="2015-10-22T00:00:00-05:00" StatusRef="1" />
      </Readings>
      <ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825603" RequestSource="Scheduled" />
    </Channel>
    <Channel StartDate="2015-10-21T00:00:00-05:00" EndDate="2015-10-22T00:00:00-05:00">
      <ChannelID ServicePointChannelID="73825604:301" />
      <Readings>
        <Reading Value="3462.5" ReadingTime="2015-10-21T00:00:00-05:00" StatusRef="1" />
        <Reading Value="3501.5" ReadingTime="2015-10-22T00:00:00-05:00" StatusRef="1" />
      </Readings>
      <ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825604" RequestSource="Scheduled" />
    </Channel>
  </Channels>
</ReadingDocument>

I want to parse the XML of the channel data into a csv file.

He is what I have written in Python 2.7.10:

import xml.etree.ElementTree as ET

tree = ET.parse('registerreads_EE.xml')

root = tree.getroot()[3]

for channel in tree.iter('Channel'):
    for exportrequest in channel.iter('ExportRequest'):
        entityid = exportrequest.attrib.get('EntityID')
        for meterread in channel.iter('Reading'):
            read = meterread.attrib.get('Value')
            date = meterread.attrib.get('ReadingTime')
            print read[:-2],",",date[:10],",",entityid
           
tree.write(open('registerreads_EE.csv','w'))

Here is the screen output when the above is run:

3577 , 2015-10-21 , 73825603
3601 , 2015-10-22 , 73825603
3462 , 2015-10-21 , 73825604
3501 , 2015-10-22 , 73825604

The 'registerreads.csv' output file is like the original XML file, minus the first line.

I would like the printed output above outputted to a csv file with headers of read, date, entityid.

I am having difficulty with this.  This is my first python program.  Any help is appreciated.
ASKER CERTIFIED SOLUTION
Avatar of pepr
pepr

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Scott Abraham

ASKER

Thank you, worked awesome!  This will come in handy for other items as well.
Avatar of pepr
pepr

I suggest to modify further the extraction of the data using the simple XPath expressions:
#!python2

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('registerreads_EE.xml')
root = tree.getroot()

with open('registerreads_EE.csv', 'wb') as fout:
    writer = csv.writer(fout)
    row = ['read', 'date', 'entityid']
    writer.writerow(row)
    for channel in root.findall('.//Channel'):
        exportrequest = channel.find('ExportRequest')
        entityid = exportrequest.attrib.get('EntityID')
        for reading in channel.findall('Readings/Reading'):
            value = reading.attrib.get('Value')
            reading_time = reading.attrib.get('ReadingTime')
            row = [value[:-2], reading_time[:10], entityid]
            print ', '.join(row)
            writer.writerow(row)

Open in new window

If you name it root it should not be the fourth element from the XML structure. The expression './/Channel' means: from this element ('.'), wherever in the structure ('//'), the Channel element. You can also be more explicit and write 'Channels/Channel' instead. You can also write './Channels/Channel' (with a single slash after the dot). It means that the Channel elements are searched only as direct children of the Channels element, and that one has to be the direct descendant of the root element.

So, find all Channel elements. Inside of each, extract the info from the ExportRequest element (no loop needed here). And inside the Channel, find all 'Readings/Reading' elements (explicit XPath again), get their attributes, form the row, and write it to the screen and to the csv file.

reading.attrib['Value'] can also be used instead of reading.attrib.get('Value'), but the later is better if the attribute may be missing (None is returned by .get() in the case).
Out of curiosity, if I wanted to open the xml file from c:\export and save it to i:\import, how would I do that?  Right now I just have the script and the files residing under the scripts directory c:\python27\scripts
You just pass the file name with the path. The bare file name is a relative path to the file at the current working directory. (That is, the 'registerreads_EE.xml' is equal to './registerreads_EE.xml'.) The recommended approach is to store the file name to the variable, and use the variable when opening the file or for reading or for writing.

Just be careful with backslashes. Or they have to be doubled (error prone), or you can use the normal slash (as in Unix -- recommended), or use a raw string literal with backslashes (error prone), or use the os.path.join() function to put the parts of the path together (also recommended). However, you have to use at least one slash after the disk name (if I recall correctly):
    ...
    fname_in = 'i:/import/registerreads_EE.xml'
    fname_out = os.path.join('i:/', 'export', 'anothersubdir', 'registerreads_EE.csv')
    fname = r'x:\path\as\raw\string\literal.txt'

Open in new window