Match a Location tag in Apache httpd.conf with python

Hi,
   I need some help parsing an httpd.conf file using Python.  I'm unable to match a multiline block of text for some reason.  I am trying to open this file to read and match the <Location /testing> </location> block and print out the matching text only.  Can someone please provide me some Python code to do this.  HEre is the sample text:

<VirtualHost *:80>
  ServerAdmin helpdesk@test.com
#  DocumentRoot "/var/www/html"
  ServerName gq-svn-01.test.com
  ServerAlias gqsvntest.test.com
  LogLevel debug
  <Location />
      AuthBasicProvider ldap
      AuthzLDAPAuthoritative off
      AuthType Basic
      AuthName
  </Location>
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>


</VirtualHost>

Open in new window

unix_admin777Asked:
Who is Participating?
 
peprCommented:
The f is the open file object.  You can directly use it in the loop to iterate through the lines of the text file.

The status variable and its testing inside the loop is a simple implementation of so called "finite automaton".  When processing the file "by hand", you think the way: Read the lines in the loop and process them. If I am outside the interesting idea, ignore the lines. Once the start line was found, I start to be interested (status changed to 1). If the ending line was found, status again changes, and I ignore the rest of the lines.

The usual temptation is to use a boolean variable to express "inside the collected lines area".  But this way you can express only two states.  If the two states are not enough, you can often see that programmers introduce another boolean variable.  But this way things are going to be complicated, and the future maintenance is more difficult.  

With the single automaton variabl, the code looks more complicated at first, but you can think about each section separately.  You can test and decide to switch to another section via assignment to the status variable.  It is easy to find the part of the code that takes care of the situation -- see how it can be modified for your purpose below.

The yield command makes the function a generator that returns lines on-the-fly. You can use it for feeding a loop, or you can process it via other means that expect an iterator -- see the end of the code where multiline string is constructed using the same generator (instead of the for loop):

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between
                    yield line     # and I also want this line

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            yield line    # yield each line, including the enclosing one

        elif status == 2: # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    s = ''.join(getLocationLines('httpd.conf', '/testing'))
    print s

Open in new window


Now it prints:
c:\tmp\_Python\unix_admin777\Q_27673761>b.py
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>

Open in new window


The m.group(1) returns the substring prescribed by regular expression that is enclosed in the first pair of parentheses. It is tested against the loc_id passed as argument of the generator.
0
 
unix_admin777Author Commented:
Here's the code I've been trying, but it doesn't work.  I'm using Python 2.6.   If someone can please explain why this doesn't work in detail, that would be great.   Thanks in advance:

#!/usr/bin/python
import re


regex=re.compile(r'\s+<Location\s/testing>.*</location>"',re.MULTILINE |re.DOTALL)

for line in open("test.conf"):
    line=line.rstrip()
    match =re.match(regex, line)
    if match:
        print line

Open in new window

0
 
peprCommented:
Firstly, the content of the httpd.conf is broken (from the XML point of view). The <Location /> at the line 7 means an empty XML element (a single tag where the closing </Location> is not expected after).  The "/" must be added as some attribute of the element.  The same holds for the line 13.  The line 16 is not paired with any opening <Limit> tag.  The line 17 must be </Location> as XML is case sensitive.

I do recommend to use the standard xml.etree.ElementTree module for parsing XML files instead of the regular expressions.  Try the following as the start point (docs.python.org/library/xml.etree.elementtree.html):

a.py
import xml.etree.ElementTree as ET

tree = ET.parse('httpd.conf')
ET.dump(tree)

Open in new window

0
 
peprCommented:
Sorry.  Back to the trees :)  I did not noticed that the httpd.conf is not a XML file. Try the following:

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    for line in getLocationLines('httpd.conf', '/testing'):
        print line.rstrip()

Open in new window


It prints on my console (Windows):
c:\tmp\___python\unix_admin777\Q_27673761>python b.py
      DAV svn
      SVNPath /home/repos/testing
     </Limit>

Open in new window

0
 
unix_admin777Author Commented:
Thank you for the help.  It seems to work, but I also want to match the </Location /testing> </location> tags as well.  Also, even after looking at this code for a while, I still don't understand what it is doing.  Is there any way you can summarize the logic here?  It seems like you have somehow tagged the matched block, but I don't understand how.  


This is what confuses me:
  for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.