Solved

Parsing XML with namespace using ElementTree

Posted on 2013-05-28
4
334 Views
Last Modified: 2013-06-09
Consider.

<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address>
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>

Open in new window


I'd like to parse the following XML using ElementTree.   Code produced should also display (print) the contents within the tags.
0
Comment
Question by:forums_mp
  • 2
  • 2
4 Comments
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
That's not valid xml.   You're trying to use a prefix that hasn't been defined. Assuming that's an oversight and  your prefix is defined and you mistakenly left it out then can you explain a bit more how you want it parsed?
import xml.etree.ElementTree as ET

xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

def dump_tree(t, depth=0):
    print "{0}: {1} => {2}".format(depth, t.tag, t.text)
    for child in t:
        dump_tree(child, depth + 1)
    if depth == 1:
        print "----------"

root = ET.fromstring(xml)
dump_tree(root)

Open in new window

0
 

Author Comment

by:forums_mp
Comment Utility
My apologies.  Yes that was an oversight on my part. Beyond that I'd like to be able to read and write to the individual elements.  In other words - when viewed from a language I'm more familiar with in particular C++. I'd like to define a composite type in python equivalent to:

struct xmlElements {
  std:string source ;
  ...
  unsigned int zip;
};
Then later create a sequence of these:
typedef std:vector <xmlElements> elementVec;

Lastly use boost::property_tree to 'get' the members and store them within the sequence of xmlElements.  

In

Would also like an example on how to set the elements then write the contents out to a file.  

My reply might have been long-winded but hopefully explains what I'm after. Thanks
0
 

Author Comment

by:forums_mp
Comment Utility
I'm using my smartphone to reply which is somewhat cumbersome.  That aside I'm after a python solution - albeit I used C++ for illustration purposes
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
Comment Utility
So sounds like you're after a collection of 'source' objects.
import xml.etree.ElementTree as ET

class InvalidTagError(Exception):
    def __init__(self, element, expected):
        self.element = element
        self.expected = expected
    def __str__(self):
        return "'{0}' is not an instance of '{1}'".format(self.element, self.expected)

class QualifiedElement(object):

    @classmethod
    def qualify(classname, tag, ns):
        if ns != '':
            tag =  "{{{ns}}}{tag}".format(ns=ns, tag=tag)
        return tag

    def fqn(self, tag, ns=None):
        if ns == None:
            ns = self.ns

        return QualifiedElement.qualify(tag, ns)

    def get_child_text(self, tag, ns=None):
        '''returns the text for the child tag or None if the tag isn't found'''
        
        child = self.element.find(self.fqn(tag, ns))
        if child == None:
           return None
        else:
           return child.text

    def __init__(self, el, ns=None):
        
        if not isinstance(el, ET.Element):
            raise TypeError()

        self.ns = ns
        self.element = el


class Source(QualifiedElement):

    def __init__(self, el, ns=None):

        QualifiedElement.__init__(self, el, ns)  

        if el.tag != self.fqn('source'):
            raise InvalidTagError(el, self.fqn('source'))

        self.num = el.get('num')  # example of attribute access
        self.name = self.get_child_text('name')
        self.street = self.get_child_text('street')
        self.city = self.get_child_text('city')
        self.zip = self.get_child_text('zip')


def main():

    xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

    ns = 'http://this/prefix/needs/to/be/defined'
    root = ET.fromstring(xml)
    sources = []
    for el in root.findall(QualifiedElement.qualify('source', ns)):
        source = Source(el, ns)
        sources.append(source)

    for source in sources:
        print source.name

if __name__=='__main__':
    main()

Open in new window

0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now