Solved

Parsing XML with namespace using ElementTree

Posted on 2013-05-28
4
341 Views
Last Modified: 2013-06-09
Consider.

<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address>
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>

Open in new window


I'd like to parse the following XML using ElementTree.   Code produced should also display (print) the contents within the tags.
0
Comment
Question by:forums_mp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39203421
That's not valid xml.   You're trying to use a prefix that hasn't been defined. Assuming that's an oversight and  your prefix is defined and you mistakenly left it out then can you explain a bit more how you want it parsed?
import xml.etree.ElementTree as ET

xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

def dump_tree(t, depth=0):
    print "{0}: {1} => {2}".format(depth, t.tag, t.text)
    for child in t:
        dump_tree(child, depth + 1)
    if depth == 1:
        print "----------"

root = ET.fromstring(xml)
dump_tree(root)

Open in new window

0
 

Author Comment

by:forums_mp
ID: 39203631
My apologies.  Yes that was an oversight on my part. Beyond that I'd like to be able to read and write to the individual elements.  In other words - when viewed from a language I'm more familiar with in particular C++. I'd like to define a composite type in python equivalent to:

struct xmlElements {
  std:string source ;
  ...
  unsigned int zip;
};
Then later create a sequence of these:
typedef std:vector <xmlElements> elementVec;

Lastly use boost::property_tree to 'get' the members and store them within the sequence of xmlElements.  

In

Would also like an example on how to set the elements then write the contents out to a file.  

My reply might have been long-winded but hopefully explains what I'm after. Thanks
0
 

Author Comment

by:forums_mp
ID: 39203634
I'm using my smartphone to reply which is somewhat cumbersome.  That aside I'm after a python solution - albeit I used C++ for illustration purposes
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 39205088
So sounds like you're after a collection of 'source' objects.
import xml.etree.ElementTree as ET

class InvalidTagError(Exception):
    def __init__(self, element, expected):
        self.element = element
        self.expected = expected
    def __str__(self):
        return "'{0}' is not an instance of '{1}'".format(self.element, self.expected)

class QualifiedElement(object):

    @classmethod
    def qualify(classname, tag, ns):
        if ns != '':
            tag =  "{{{ns}}}{tag}".format(ns=ns, tag=tag)
        return tag

    def fqn(self, tag, ns=None):
        if ns == None:
            ns = self.ns

        return QualifiedElement.qualify(tag, ns)

    def get_child_text(self, tag, ns=None):
        '''returns the text for the child tag or None if the tag isn't found'''
        
        child = self.element.find(self.fqn(tag, ns))
        if child == None:
           return None
        else:
           return child.text

    def __init__(self, el, ns=None):
        
        if not isinstance(el, ET.Element):
            raise TypeError()

        self.ns = ns
        self.element = el


class Source(QualifiedElement):

    def __init__(self, el, ns=None):

        QualifiedElement.__init__(self, el, ns)  

        if el.tag != self.fqn('source'):
            raise InvalidTagError(el, self.fqn('source'))

        self.num = el.get('num')  # example of attribute access
        self.name = self.get_child_text('name')
        self.street = self.get_child_text('street')
        self.city = self.get_child_text('city')
        self.zip = self.get_child_text('zip')


def main():

    xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

    ns = 'http://this/prefix/needs/to/be/defined'
    root = ET.fromstring(xml)
    sources = []
    for el in root.findall(QualifiedElement.qualify('source', ns)):
        source = Source(el, ns)
        sources.append(source)

    for source in sources:
        print source.name

if __name__=='__main__':
    main()

Open in new window

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The really strange introduction Once upon a time there were individuals who intentionally put the grass seeds to the soil with anticipation of solving their nutrition problems. Or they maybe only played with seeds and noticed what happened... Som…
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question