Solved

Parsing XML with namespace using ElementTree

Posted on 2013-05-28
4
335 Views
Last Modified: 2013-06-09
Consider.

<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address>
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>

Open in new window


I'd like to parse the following XML using ElementTree.   Code produced should also display (print) the contents within the tags.
0
Comment
Question by:forums_mp
  • 2
  • 2
4 Comments
 
LVL 25

Expert Comment

by:clockwatcher
ID: 39203421
That's not valid xml.   You're trying to use a prefix that hasn't been defined. Assuming that's an oversight and  your prefix is defined and you mistakenly left it out then can you explain a bit more how you want it parsed?
import xml.etree.ElementTree as ET

xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

def dump_tree(t, depth=0):
    print "{0}: {1} => {2}".format(depth, t.tag, t.text)
    for child in t:
        dump_tree(child, depth + 1)
    if depth == 1:
        print "----------"

root = ET.fromstring(xml)
dump_tree(root)

Open in new window

0
 

Author Comment

by:forums_mp
ID: 39203631
My apologies.  Yes that was an oversight on my part. Beyond that I'd like to be able to read and write to the individual elements.  In other words - when viewed from a language I'm more familiar with in particular C++. I'd like to define a composite type in python equivalent to:

struct xmlElements {
  std:string source ;
  ...
  unsigned int zip;
};
Then later create a sequence of these:
typedef std:vector <xmlElements> elementVec;

Lastly use boost::property_tree to 'get' the members and store them within the sequence of xmlElements.  

In

Would also like an example on how to set the elements then write the contents out to a file.  

My reply might have been long-winded but hopefully explains what I'm after. Thanks
0
 

Author Comment

by:forums_mp
ID: 39203634
I'm using my smartphone to reply which is somewhat cumbersome.  That aside I'm after a python solution - albeit I used C++ for illustration purposes
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 39205088
So sounds like you're after a collection of 'source' objects.
import xml.etree.ElementTree as ET

class InvalidTagError(Exception):
    def __init__(self, element, expected):
        self.element = element
        self.expected = expected
    def __str__(self):
        return "'{0}' is not an instance of '{1}'".format(self.element, self.expected)

class QualifiedElement(object):

    @classmethod
    def qualify(classname, tag, ns):
        if ns != '':
            tag =  "{{{ns}}}{tag}".format(ns=ns, tag=tag)
        return tag

    def fqn(self, tag, ns=None):
        if ns == None:
            ns = self.ns

        return QualifiedElement.qualify(tag, ns)

    def get_child_text(self, tag, ns=None):
        '''returns the text for the child tag or None if the tag isn't found'''
        
        child = self.element.find(self.fqn(tag, ns))
        if child == None:
           return None
        else:
           return child.text

    def __init__(self, el, ns=None):
        
        if not isinstance(el, ET.Element):
            raise TypeError()

        self.ns = ns
        self.element = el


class Source(QualifiedElement):

    def __init__(self, el, ns=None):

        QualifiedElement.__init__(self, el, ns)  

        if el.tag != self.fqn('source'):
            raise InvalidTagError(el, self.fqn('source'))

        self.num = el.get('num')  # example of attribute access
        self.name = self.get_child_text('name')
        self.street = self.get_child_text('street')
        self.city = self.get_child_text('city')
        self.zip = self.get_child_text('zip')


def main():

    xml='''<?xml version="1.0"?>
<!--
    Document   test.xml
    Created on :
    Author     : Jane Doe
    Description: XML Definition for address 
-->
<st:address xmlns:st="http://this/prefix/needs/to/be/defined" >
  <st:source num="1">
    <st:name>Bubba McBubba</st:name>
    <st:street>123 Happy Go Lucky Ln.</st:street>
    <st:city>Seattle</st:city>
    <st:state>WA</st:state>
    <st:zip>98056</st:zip>
  </st:source>
  <st:source num="2">
    <st:name>McBubba</st:name>
    <st:street>456 Happy Go Lucky Ln.</st:street>
    <st:city>Orlando</st:city>
    <st:state>FL</st:state>
    <st:zip>43336</st:zip>
  </st:source>
</st:address>'''

    ns = 'http://this/prefix/needs/to/be/defined'
    root = ET.fromstring(xml)
    sources = []
    for el in root.findall(QualifiedElement.qualify('source', ns)):
        source = Source(el, ns)
        sources.append(source)

    for source in sources:
        print source.name

if __name__=='__main__':
    main()

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Installing Python 2.7.3 version on Windows operating system For installing Python first we need to download Python's latest version from URL" www.python.org " You can also get information on Python scripting language from the above mentioned we…
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now