Match a Location tag in Apache httpd.conf with python

Hi,
   I need some help parsing an httpd.conf file using Python.  I'm unable to match a multiline block of text for some reason.  I am trying to open this file to read and match the <Location /testing> </location> block and print out the matching text only.  Can someone please provide me some Python code to do this.  HEre is the sample text:

<VirtualHost *:80>
  ServerAdmin helpdesk@test.com
#  DocumentRoot "/var/www/html"
  ServerName gq-svn-01.test.com
  ServerAlias gqsvntest.test.com
  LogLevel debug
  <Location />
      AuthBasicProvider ldap
      AuthzLDAPAuthoritative off
      AuthType Basic
      AuthName
  </Location>
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>


</VirtualHost>

Open in new window

unix_admin777Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

unix_admin777Author Commented:
Here's the code I've been trying, but it doesn't work.  I'm using Python 2.6.   If someone can please explain why this doesn't work in detail, that would be great.   Thanks in advance:

#!/usr/bin/python
import re


regex=re.compile(r'\s+<Location\s/testing>.*</location>"',re.MULTILINE |re.DOTALL)

for line in open("test.conf"):
    line=line.rstrip()
    match =re.match(regex, line)
    if match:
        print line

Open in new window

0
peprCommented:
Firstly, the content of the httpd.conf is broken (from the XML point of view). The <Location /> at the line 7 means an empty XML element (a single tag where the closing </Location> is not expected after).  The "/" must be added as some attribute of the element.  The same holds for the line 13.  The line 16 is not paired with any opening <Limit> tag.  The line 17 must be </Location> as XML is case sensitive.

I do recommend to use the standard xml.etree.ElementTree module for parsing XML files instead of the regular expressions.  Try the following as the start point (docs.python.org/library/xml.etree.elementtree.html):

a.py
import xml.etree.ElementTree as ET

tree = ET.parse('httpd.conf')
ET.dump(tree)

Open in new window

0
peprCommented:
Sorry.  Back to the trees :)  I did not noticed that the httpd.conf is not a XML file. Try the following:

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    for line in getLocationLines('httpd.conf', '/testing'):
        print line.rstrip()

Open in new window


It prints on my console (Windows):
c:\tmp\___python\unix_admin777\Q_27673761>python b.py
      DAV svn
      SVNPath /home/repos/testing
     </Limit>

Open in new window

0
unix_admin777Author Commented:
Thank you for the help.  It seems to work, but I also want to match the </Location /testing> </location> tags as well.  Also, even after looking at this code for a while, I still don't understand what it is doing.  Is there any way you can summarize the logic here?  It seems like you have somehow tagged the matched block, but I don't understand how.  


This is what confuses me:
  for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

Open in new window

0
peprCommented:
The f is the open file object.  You can directly use it in the loop to iterate through the lines of the text file.

The status variable and its testing inside the loop is a simple implementation of so called "finite automaton".  When processing the file "by hand", you think the way: Read the lines in the loop and process them. If I am outside the interesting idea, ignore the lines. Once the start line was found, I start to be interested (status changed to 1). If the ending line was found, status again changes, and I ignore the rest of the lines.

The usual temptation is to use a boolean variable to express "inside the collected lines area".  But this way you can express only two states.  If the two states are not enough, you can often see that programmers introduce another boolean variable.  But this way things are going to be complicated, and the future maintenance is more difficult.  

With the single automaton variabl, the code looks more complicated at first, but you can think about each section separately.  You can test and decide to switch to another section via assignment to the status variable.  It is easy to find the part of the code that takes care of the situation -- see how it can be modified for your purpose below.

The yield command makes the function a generator that returns lines on-the-fly. You can use it for feeding a loop, or you can process it via other means that expect an iterator -- see the end of the code where multiline string is constructed using the same generator (instead of the for loop):

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between
                    yield line     # and I also want this line

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            yield line    # yield each line, including the enclosing one

        elif status == 2: # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    s = ''.join(getLocationLines('httpd.conf', '/testing'))
    print s

Open in new window


Now it prints:
c:\tmp\_Python\unix_admin777\Q_27673761>b.py
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>

Open in new window


The m.group(1) returns the substring prescribed by regular expression that is enclosed in the first pair of parentheses. It is tested against the loc_id passed as argument of the generator.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.