Solved

Parsing XML with namespace using ElementTree

Posted on 2013-05-28
4
343 Views
Last Modified: 2013-06-09
Consider.

<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address>
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>

Open in new window


I'd like to parse the following XML using ElementTree.   Code produced should also display (print) the contents within the tags.
0
Comment
Question by:forums_mp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39203421
That's not valid xml.   You're trying to use a prefix that hasn't been defined. Assuming that's an oversight and  your prefix is defined and you mistakenly left it out then can you explain a bit more how you want it parsed?
import xml.etree.ElementTree as ET

xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

def dump_tree(t, depth=0):
    print "{0}: {1} => {2}".format(depth, t.tag, t.text)
    for child in t:
        dump_tree(child, depth + 1)
    if depth == 1:
        print "----------"

root = ET.fromstring(xml)
dump_tree(root)

Open in new window

0
 

Author Comment

by:forums_mp
ID: 39203631
My apologies.  Yes that was an oversight on my part. Beyond that I'd like to be able to read and write to the individual elements.  In other words - when viewed from a language I'm more familiar with in particular C++. I'd like to define a composite type in python equivalent to:

struct xmlElements {
  std:string source ;
  ...
  unsigned int zip;
};
Then later create a sequence of these:
typedef std:vector <xmlElements> elementVec;

Lastly use boost::property_tree to 'get' the members and store them within the sequence of xmlElements.  

In

Would also like an example on how to set the elements then write the contents out to a file.  

My reply might have been long-winded but hopefully explains what I'm after. Thanks
0
 

Author Comment

by:forums_mp
ID: 39203634
I'm using my smartphone to reply which is somewhat cumbersome.  That aside I'm after a python solution - albeit I used C++ for illustration purposes
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 39205088
So sounds like you're after a collection of 'source' objects.
import xml.etree.ElementTree as ET

class InvalidTagError(Exception):
    def __init__(self, element, expected):
        self.element = element
        self.expected = expected
    def __str__(self):
        return "'{0}' is not an instance of '{1}'".format(self.element, self.expected)

class QualifiedElement(object):

    @classmethod
    def qualify(classname, tag, ns):
        if ns != '':
            tag =  "{{{ns}}}{tag}".format(ns=ns, tag=tag)
        return tag

    def fqn(self, tag, ns=None):
        if ns == None:
            ns = self.ns

        return QualifiedElement.qualify(tag, ns)

    def get_child_text(self, tag, ns=None):
        '''returns the text for the child tag or None if the tag isn't found'''
        
        child = self.element.find(self.fqn(tag, ns))
        if child == None:
           return None
        else:
           return child.text

    def __init__(self, el, ns=None):
        
        if not isinstance(el, ET.Element):
            raise TypeError()

        self.ns = ns
        self.element = el


class Source(QualifiedElement):

    def __init__(self, el, ns=None):

        QualifiedElement.__init__(self, el, ns)  

        if el.tag != self.fqn('source'):
            raise InvalidTagError(el, self.fqn('source'))

        self.num = el.get('num')  # example of attribute access
        self.name = self.get_child_text('name')
        self.street = self.get_child_text('street')
        self.city = self.get_child_text('city')
        self.zip = self.get_child_text('zip')


def main():

    xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

    ns = 'http://this/prefix/needs/to/be/defined'
    root = ET.fromstring(xml)
    sources = []
    for el in root.findall(QualifiedElement.qualify('source', ns)):
        source = Source(el, ns)
        sources.append(source)

    for source in sources:
        print source.name

if __name__=='__main__':
    main()

Open in new window

0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question