Solved

Parsing XML with namespace using ElementTree

Posted on 2013-05-28
4
339 Views
Last Modified: 2013-06-09
Consider.

<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address>
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>

Open in new window


I'd like to parse the following XML using ElementTree.   Code produced should also display (print) the contents within the tags.
0
Comment
Question by:forums_mp
  • 2
  • 2
4 Comments
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39203421
That's not valid xml.   You're trying to use a prefix that hasn't been defined. Assuming that's an oversight and  your prefix is defined and you mistakenly left it out then can you explain a bit more how you want it parsed?
import xml.etree.ElementTree as ET

xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

def dump_tree(t, depth=0):
    print "{0}: {1} => {2}".format(depth, t.tag, t.text)
    for child in t:
        dump_tree(child, depth + 1)
    if depth == 1:
        print "----------"

root = ET.fromstring(xml)
dump_tree(root)

Open in new window

0
 

Author Comment

by:forums_mp
ID: 39203631
My apologies.  Yes that was an oversight on my part. Beyond that I'd like to be able to read and write to the individual elements.  In other words - when viewed from a language I'm more familiar with in particular C++. I'd like to define a composite type in python equivalent to:

struct xmlElements {
  std:string source ;
  ...
  unsigned int zip;
};
Then later create a sequence of these:
typedef std:vector <xmlElements> elementVec;

Lastly use boost::property_tree to 'get' the members and store them within the sequence of xmlElements.  

In

Would also like an example on how to set the elements then write the contents out to a file.  

My reply might have been long-winded but hopefully explains what I'm after. Thanks
0
 

Author Comment

by:forums_mp
ID: 39203634
I'm using my smartphone to reply which is somewhat cumbersome.  That aside I'm after a python solution - albeit I used C++ for illustration purposes
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 39205088
So sounds like you're after a collection of 'source' objects.
import xml.etree.ElementTree as ET

class InvalidTagError(Exception):
    def __init__(self, element, expected):
        self.element = element
        self.expected = expected
    def __str__(self):
        return "'{0}' is not an instance of '{1}'".format(self.element, self.expected)

class QualifiedElement(object):

    @classmethod
    def qualify(classname, tag, ns):
        if ns != '':
            tag =  "{{{ns}}}{tag}".format(ns=ns, tag=tag)
        return tag

    def fqn(self, tag, ns=None):
        if ns == None:
            ns = self.ns

        return QualifiedElement.qualify(tag, ns)

    def get_child_text(self, tag, ns=None):
        '''returns the text for the child tag or None if the tag isn't found'''
        
        child = self.element.find(self.fqn(tag, ns))
        if child == None:
           return None
        else:
           return child.text

    def __init__(self, el, ns=None):
        
        if not isinstance(el, ET.Element):
            raise TypeError()

        self.ns = ns
        self.element = el


class Source(QualifiedElement):

    def __init__(self, el, ns=None):

        QualifiedElement.__init__(self, el, ns)  

        if el.tag != self.fqn('source'):
            raise InvalidTagError(el, self.fqn('source'))

        self.num = el.get('num')  # example of attribute access
        self.name = self.get_child_text('name')
        self.street = self.get_child_text('street')
        self.city = self.get_child_text('city')
        self.zip = self.get_child_text('zip')


def main():

    xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

    ns = 'http://this/prefix/needs/to/be/defined'
    root = ET.fromstring(xml)
    sources = []
    for el in root.findall(QualifiedElement.qualify('source', ns)):
        source = Source(el, ns)
        sources.append(source)

    for source in sources:
        print source.name

if __name__=='__main__':
    main()

Open in new window

0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Python tuples 2 126
python - upgrading from vers 3.4.1 to 3.5.2 - advice requested 7 194
how to check case insensitive substring 5 63
Python 2.7 - French characters 6 105
A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question