Link to home
Start Free TrialLog in
Avatar of unix_admin777
unix_admin777Flag for Afghanistan

asked on

Match a Location tag in Apache httpd.conf with python

Hi,
   I need some help parsing an httpd.conf file using Python.  I'm unable to match a multiline block of text for some reason.  I am trying to open this file to read and match the <Location /testing> </location> block and print out the matching text only.  Can someone please provide me some Python code to do this.  HEre is the sample text:

<VirtualHost *:80>
  ServerAdmin helpdesk@test.com
#  DocumentRoot "/var/www/html"
  ServerName gq-svn-01.test.com
  ServerAlias gqsvntest.test.com
  LogLevel debug
  <Location />
      AuthBasicProvider ldap
      AuthzLDAPAuthoritative off
      AuthType Basic
      AuthName
  </Location>
  <Location /testing>
      DAV svn
      SVNPath /home/repos/testing
     </Limit>
  </location>


</VirtualHost>

Open in new window

Avatar of unix_admin777
unix_admin777
Flag of Afghanistan image

ASKER

Here's the code I've been trying, but it doesn't work.  I'm using Python 2.6.   If someone can please explain why this doesn't work in detail, that would be great.   Thanks in advance:

#!/usr/bin/python
import re


regex=re.compile(r'\s+<Location\s/testing>.*</location>"',re.MULTILINE |re.DOTALL)

for line in open("test.conf"):
    line=line.rstrip()
    match =re.match(regex, line)
    if match:
        print line

Open in new window

Avatar of pepr
pepr

Firstly, the content of the httpd.conf is broken (from the XML point of view). The <Location /> at the line 7 means an empty XML element (a single tag where the closing </Location> is not expected after).  The "/" must be added as some attribute of the element.  The same holds for the line 13.  The line 16 is not paired with any opening <Limit> tag.  The line 17 must be </Location> as XML is case sensitive.

I do recommend to use the standard xml.etree.ElementTree module for parsing XML files instead of the regular expressions.  Try the following as the start point (docs.python.org/library/xml.etree.elementtree.html):

a.py
import xml.etree.ElementTree as ET

tree = ET.parse('httpd.conf')
ET.dump(tree)

Open in new window

Sorry.  Back to the trees :)  I did not noticed that the httpd.conf is not a XML file. Try the following:

b.py
import re

def getLocationLines(fname, loc_id):

    rexLocationOpen = re.compile(r'<Location\s+(\S+)?\s*>')
    rexLocationClose = re.compile(r'</location>')  # should be </Location> with capital L
    f = open(fname)
    status = 0
    for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

    f.close()


if __name__ == '__main__':
    for line in getLocationLines('httpd.conf', '/testing'):
        print line.rstrip()

Open in new window


It prints on my console (Windows):
c:\tmp\___python\unix_admin777\Q_27673761>python b.py
      DAV svn
      SVNPath /home/repos/testing
     </Limit>

Open in new window

Thank you for the help.  It seems to work, but I also want to match the </Location /testing> </location> tags as well.  Also, even after looking at this code for a while, I still don't understand what it is doing.  Is there any way you can summarize the logic here?  It seems like you have somehow tagged the matched block, but I don't understand how.  


This is what confuses me:
  for line in f:
        if status == 0:   # waiting for the opening line
            m = rexLocationOpen.search(line)
            if m:         # found?
                if m.group(1) == loc_id:
                    status = 1     # let's generate the content between

        elif status == 1: # generating lines between the tags
            m = rexLocationClose.search(line)
            if m:
                status = 2
            else:
                yield line  # yield the next interesting line

        elif status == 2:   # just pass the rest of the lines
            pass

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of pepr
pepr

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial