Solved

Match a Location tag in Apache httpd.conf with python

Posted on 2012-04-12
5
577 Views
Last Modified: 2013-11-13
Hi,
   I need some help parsing an httpd.conf file using Python.  I'm unable to match a multiline block of text for some reason.  I am trying to open this file to read and match the <Location /testing> </location> block and print out the matching text only.  Can someone please provide me some Python code to do this.  HEre is the sample text:

<VirtualHost *:80>
  ServerAdmin helpdesk@test.com
#  DocumentRoot "/var/www/html"
  ServerName gq-svn-01.test.com
  ServerAlias gqsvntest.test.com
  LogLevel debug
  <Location />
      AuthBasicProvider ldap
      AuthzLDAPAuthoritative off
      AuthType Basic
      AuthName
  </Location>
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>


</VirtualHost>

Open in new window

0
Comment
Question by:unix_admin777
  • 3
  • 2
5 Comments
 

Author Comment

by:unix_admin777
ID: 37840589
Here's the code I've been trying, but it doesn't work.  I'm using Python 2.6.   If someone can please explain why this doesn't work in detail, that would be great.   Thanks in advance:

#!/usr/bin/python
import re


regex=re.compile(r'\s+<Location\s/testing>.*</location>"',re.MULTILINE |re.DOTALL)

for line in open("test.conf"):
    line=line.rstrip()
    match =re.match(regex, line)
    if match:
        print line

Open in new window

0
 
LVL 29

Expert Comment

by:pepr
ID: 37841517
Firstly, the content of the httpd.conf is broken (from the XML point of view). The <Location /> at the line 7 means an empty XML element (a single tag where the closing </Location> is not expected after).  The "/" must be added as some attribute of the element.  The same holds for the line 13.  The line 16 is not paired with any opening <Limit> tag.  The line 17 must be </Location> as XML is case sensitive.

I do recommend to use the standard xml.etree.ElementTree module for parsing XML files instead of the regular expressions.  Try the following as the start point (docs.python.org/library/xml.etree.elementtree.html):

a.py
import xml.etree.ElementTree as ET

tree = ET.parse('httpd.conf')
ET.dump(tree)

Open in new window

0
 
LVL 29

Expert Comment

by:pepr
ID: 37841577
Sorry.  Back to the trees :)  I did not noticed that the httpd.conf is not a XML file. Try the following:

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    for line in getLocationLines('httpd.conf', '/testing'):
        print line.rstrip()

Open in new window


It prints on my console (Windows):
c:\tmp\___python\unix_admin777\Q_27673761>python b.py
      DAV svn
      SVNPath /home/repos/testing
     </Limit>

Open in new window

0
 

Author Comment

by:unix_admin777
ID: 37845324
Thank you for the help.  It seems to work, but I also want to match the </Location /testing> </location> tags as well.  Also, even after looking at this code for a while, I still don't understand what it is doing.  Is there any way you can summarize the logic here?  It seems like you have somehow tagged the matched block, but I don't understand how.  


This is what confuses me:
  for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

Open in new window

0
 
LVL 29

Accepted Solution

by:
pepr earned 500 total points
ID: 37845839
The f is the open file object.  You can directly use it in the loop to iterate through the lines of the text file.

The status variable and its testing inside the loop is a simple implementation of so called "finite automaton".  When processing the file "by hand", you think the way: Read the lines in the loop and process them. If I am outside the interesting idea, ignore the lines. Once the start line was found, I start to be interested (status changed to 1). If the ending line was found, status again changes, and I ignore the rest of the lines.

The usual temptation is to use a boolean variable to express "inside the collected lines area".  But this way you can express only two states.  If the two states are not enough, you can often see that programmers introduce another boolean variable.  But this way things are going to be complicated, and the future maintenance is more difficult.  

With the single automaton variabl, the code looks more complicated at first, but you can think about each section separately.  You can test and decide to switch to another section via assignment to the status variable.  It is easy to find the part of the code that takes care of the situation -- see how it can be modified for your purpose below.

The yield command makes the function a generator that returns lines on-the-fly. You can use it for feeding a loop, or you can process it via other means that expect an iterator -- see the end of the code where multiline string is constructed using the same generator (instead of the for loop):

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between
                    yield line     # and I also want this line

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            yield line    # yield each line, including the enclosing one

        elif status == 2: # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    s = ''.join(getLocationLines('httpd.conf', '/testing'))
    print s

Open in new window


Now it prints:
c:\tmp\_Python\unix_admin777\Q_27673761>b.py
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>

Open in new window


The m.group(1) returns the substring prescribed by regular expression that is enclosed in the first pair of parentheses. It is tested against the loc_id passed as argument of the generator.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Is doing tutor.com teaching in my situation advisable? 2 112
ejb wildfly example 2 29
Problem to Alipay 10 48
AvlTree-Node Data type 4 14
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question