Solved

Match a Location tag in Apache httpd.conf with python

Posted on 2012-04-12
5
572 Views
Last Modified: 2013-11-13
Hi,
   I need some help parsing an httpd.conf file using Python.  I'm unable to match a multiline block of text for some reason.  I am trying to open this file to read and match the <Location /testing> </location> block and print out the matching text only.  Can someone please provide me some Python code to do this.  HEre is the sample text:

<VirtualHost *:80>
  ServerAdmin helpdesk@test.com
#  DocumentRoot "/var/www/html"
  ServerName gq-svn-01.test.com
  ServerAlias gqsvntest.test.com
  LogLevel debug
  <Location />
      AuthBasicProvider ldap
      AuthzLDAPAuthoritative off
      AuthType Basic
      AuthName
  </Location>
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>


</VirtualHost>

Open in new window

0
Comment
Question by:unix_admin777
  • 3
  • 2
5 Comments
 

Author Comment

by:unix_admin777
ID: 37840589
Here's the code I've been trying, but it doesn't work.  I'm using Python 2.6.   If someone can please explain why this doesn't work in detail, that would be great.   Thanks in advance:

#!/usr/bin/python
import re


regex=re.compile(r'\s+<Location\s/testing>.*</location>"',re.MULTILINE |re.DOTALL)

for line in open("test.conf"):
    line=line.rstrip()
    match =re.match(regex, line)
    if match:
        print line

Open in new window

0
 
LVL 28

Expert Comment

by:pepr
ID: 37841517
Firstly, the content of the httpd.conf is broken (from the XML point of view). The <Location /> at the line 7 means an empty XML element (a single tag where the closing </Location> is not expected after).  The "/" must be added as some attribute of the element.  The same holds for the line 13.  The line 16 is not paired with any opening <Limit> tag.  The line 17 must be </Location> as XML is case sensitive.

I do recommend to use the standard xml.etree.ElementTree module for parsing XML files instead of the regular expressions.  Try the following as the start point (docs.python.org/library/xml.etree.elementtree.html):

a.py
import xml.etree.ElementTree as ET

tree = ET.parse('httpd.conf')
ET.dump(tree)

Open in new window

0
 
LVL 28

Expert Comment

by:pepr
ID: 37841577
Sorry.  Back to the trees :)  I did not noticed that the httpd.conf is not a XML file. Try the following:

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    for line in getLocationLines('httpd.conf', '/testing'):
        print line.rstrip()

Open in new window


It prints on my console (Windows):
c:\tmp\___python\unix_admin777\Q_27673761>python b.py
      DAV svn
      SVNPath /home/repos/testing
     </Limit>

Open in new window

0
 

Author Comment

by:unix_admin777
ID: 37845324
Thank you for the help.  It seems to work, but I also want to match the </Location /testing> </location> tags as well.  Also, even after looking at this code for a while, I still don't understand what it is doing.  Is there any way you can summarize the logic here?  It seems like you have somehow tagged the matched block, but I don't understand how.  


This is what confuses me:
  for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

Open in new window

0
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 37845839
The f is the open file object.  You can directly use it in the loop to iterate through the lines of the text file.

The status variable and its testing inside the loop is a simple implementation of so called "finite automaton".  When processing the file "by hand", you think the way: Read the lines in the loop and process them. If I am outside the interesting idea, ignore the lines. Once the start line was found, I start to be interested (status changed to 1). If the ending line was found, status again changes, and I ignore the rest of the lines.

The usual temptation is to use a boolean variable to express "inside the collected lines area".  But this way you can express only two states.  If the two states are not enough, you can often see that programmers introduce another boolean variable.  But this way things are going to be complicated, and the future maintenance is more difficult.  

With the single automaton variabl, the code looks more complicated at first, but you can think about each section separately.  You can test and decide to switch to another section via assignment to the status variable.  It is easy to find the part of the code that takes care of the situation -- see how it can be modified for your purpose below.

The yield command makes the function a generator that returns lines on-the-fly. You can use it for feeding a loop, or you can process it via other means that expect an iterator -- see the end of the code where multiline string is constructed using the same generator (instead of the for loop):

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between
                    yield line     # and I also want this line

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            yield line    # yield each line, including the enclosing one

        elif status == 2: # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    s = ''.join(getLocationLines('httpd.conf', '/testing'))
    print s

Open in new window


Now it prints:
c:\tmp\_Python\unix_admin777\Q_27673761>b.py
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>

Open in new window


The m.group(1) returns the substring prescribed by regular expression that is enclosed in the first pair of parentheses. It is tested against the loc_id passed as argument of the generator.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now