We help IT Professionals succeed at work.

reading apaache logs

SheldonC
SheldonC asked
on
I am working on my computer project, I am not sure of the forum's position on aiding students with work. Its just that I am stuck on this question for a long time now and need to move forward since the deadline is approaching. I will upload the script I have thus far and the question I am trying to solve. If anyone can point me in the right direction I will be grateful.

Thanks

#/usr/bin/perl

use File::Basename;

#------------------------------------------------------------------------------#
#  Global variables that control the program action and output.                #
#------------------------------------------------------------------------------#

$NUM_RECS_TO_PRINT = 10;   # num of output recs to print per section

#---------------------------------------------------------------------#
#  Change this array to include index filenames used on your system.  #
#---------------------------------------------------------------------#

@indexFilenames = ('index.htm', 'index.html', 'index.shtml');


#----------------------------------------------------------------------#
# don't change anything below here unless you're comfortable with Perl #
#----------------------------------------------------------------------#

sub usage {
   print STDERR "\n\tUsage:  log2.pl access_log > output_file\n";
}


#----------------------------------------------------------#
#  These are two helper routines for the 'sort' function.  #
#----------------------------------------------------------#

sub fileNumericAscending {
   $numFileRequests{$a} <=> $numFileRequests{$b};
}

sub fileNumericDescending {
   $numFileRequests{$b} <=> $numFileRequests{$a};
}

sub trim($)
{
   my $string = shift;
   $string =~ s/^\s+//;
   $string =~ s/\s+$//;
   return $string;
}


#----------------------------<<   main   >>-----------------------------#

   #--------------------------------------------------------------------#
   #  Start by making sure the user is invoking this program properly.  #
   #--------------------------------------------------------------------#

   $numArgs = $#ARGV + 1;

   if ($numArgs != 1) {
      &usage;
      exit 1;
   }

   $logFile = $ARGV[0];

   open (LOGFILE,"access_log") || die "  Error opening log file $logFile.\n";

   #------------------------------------------------------------------#
   #  Start reading and processing the access_log file in this loop.  #
   #------------------------------------------------------------------#

   #printf "<pre>\n";
   while(<LOGFILE>)
   {


	if (/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/)
	{
		$REMOTE_IP{$1}++
	}
   



	 #if (/\b[^(\s)]*)$|([^(]+?)\s*(\(.*\)/)
#(/([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/)
  #(/(\b[^(\s)]*)$|([^(]+?)\s*(\(.*\))/)
 	#{

	#$USER_AGENT{$1}++
	#}



      chomp;

      #----------------------------------------------#
        #  condense one or more whitespace character   #
      #  to one single space                         #
      #----------------------------------------------#

      s/\s+/ /go;

      #----------------------------------------------------------#
      #  the next line breaks each line of the access_log into   #
      #  nine variables                                          #
      #----------------------------------------------------------#

      ($clientAddress,    $rfc1413,      $username, 
      $localTime,         $httpRequest,  $statusCode, 
      $bytesSentToClient, $referer,      $clientSoftware) =
      /^(\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;

      #--------------------------------------------------------------------#
      # take care of problem where the $httpRequest may simply be a hyphen #
      #--------------------------------------------------------------------#

      next if ($httpRequest =~ '^-$');

      #-----------------------------------------#
      #  Determine the value of $fileRequested  #
      #-----------------------------------------#

      ($getPost, $fileRequested, $junk) = split(' ', $httpRequest, 6);
	 ($getPost, $clientAddress, $junk) = split(' ', $clientAddress, 1);
     

      #-----------------------------------------------------------------#
      #  if the base filename is something like index.htm, index.html,  #
      #  or index.shtml, interpret this to be the same as the path by   #
      #  itself.  This way, '/java/' is the same as '/java/index.html'. #
      #-----------------------------------------------------------------#

      foreach $indexFile (@indexFilenames) {
        chomp($fileRequested);
        $fileRequested = trim($fileRequested);
        if ($fileRequested =~ /^\s+$/) {
           next;
        }
        if ($fileRequested =~ /^$/) {
           next;
        }
        if (basename($fileRequested) =~ /$indexFile/i) {
           $fileRequested = dirname($fileRequested);
           last;
        }
      }

      #----------------------------------------------------------------#
      #  If the last character in $fileRequested is a '/', remove it.  #
      #  This makes /perl/ equal to /perl.                             #
      #----------------------------------------------------------------#

      if (length($fileRequested) > 1) 
      {
        if (substr($fileRequested,length($fileRequested)-1,1) eq '/') 
        {
          chop($fileRequested);
        }
      }

      #-----------------------------------------------------#
      #  here's where we count the number of hits per file  #
      #-----------------------------------------------------#

      $numFileRequests{$fileRequested}++;



   }#end first while loop

   close (LOGFILE);






   #--------------------------------------#
   #  Output the number IPs  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT IP ADDRESSES:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $ip (sort {$REMOTE_IP{$b} <=> $REMOTE_IP{$a}} (keys(%REMOTE_IP))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$ip = $REMOTE_IP{$ip}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";




   #--------------------------------------#
   #  Output the number IPs  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT USER AGENTS:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $agent (sort {$USER_AGENT{$b} <=> $USER_AGENT{$a}} (keys(%USER_AGENT))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$agent= $USER_AGENT{$agent}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";




   #--------------------------------------#
   #  Output the number of hits per file  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT CONNECT REQUESTS:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach $key (sort fileNumericDescending (keys(%numFileRequests))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$numFileRequests{$key},$httpRequest{$key} \t\t $key\n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";





open (LOGFILE,"audit_log") || die "  Error opening log file $logFile.\n";
   #printf "<pre>\n";
   while (<LOGFILE>) {

if (/mod_security-message:.*\./)
{
$MOD_SEC{$1}++
}

}
 close (LOGFILE);



   #--------------------------------------#
   #  Output the number of hits per file  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT PATTERN MATCH:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $modsec (sort {$MOD_SEC{$b} <=> $MOD_SEC{$a}} (keys(%MOD_SEC))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$agent= $MOD_SEC{$modsec}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";

Open in new window

   



This is the question I am stuck at
4. Search Logs for mod_security-message which is access denied by mod_security

When Mod_Security identifies a problem with a request due to a security violation, it will do two things – 1) Add in some additional client request headers stating why mod_security is taking action, and 2) Log this data to the audit_log and error_log files.  These error messages can be triggered by Mod_Security special checks such as the SecFilterCheckURLEncoding directive, basic filters such as “\.\.” to prevent directory traversals and advanced filters based on converted snort rules.

Search Logic: Search the audit_log entries that have the mod_security-message header, then sort the results, then only show unique entries with a total count of each type in reverse order from highest to lowest, then remove the mod_security-message data at the beginning of each line and list the Top 10 results.

      Your output will be similar to:

   1 51746 Pattern match "Basic" at HEADER.
   2 6138 Pattern match "passwd\=" at THE_REQUEST.
   3 5852 Pattern match "/search" at THE_REQUEST.
   4 5368 Pattern match "passwd=" at THE_REQUEST.
   5 4826 Pattern match "\.asp" at THE_REQUEST.
   6 3694 Pattern match "login.icq.com" at THE_REQUEST.
   7 1971 mod_security-message: Invalid character detected
   8 1935 Pattern match "/smartsearch\.cgi" at THE_REQUEST.
   9 1887 Pattern match "cmd\.exe" at THE_REQUEST.
  10 1387 Pattern match "/sh" at THE_REQUEST.
Comment
Watch Question

CERTIFIED EXPERT

Commented:
Always use

use warnings;
use strict;

This is help you troubleshoot and force good practices.

Line 39 looks wrong
sub trim($)

It should not have ($)

It is fine to use File::Basename but not needed. You can use a simple command like
$filename =~ s/\..*//;

Author

Commented:
thanks for the instructions but the main part I am stuck on is this. Everything else works except this.
It doesn't output the mod_security-message header

 
open (LOGFILE,"audit_log") || die "  Error opening log file $logFile.\n";
   #printf "<pre>\n";
   while (<LOGFILE>) {

if (/mod_security-message:.*\./)
{
$MOD_SEC{$1}++
}

}
 close (LOGFILE);



   #--------------------------------------#
   #  Output the number of hits per file  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT PATTERN MATCH:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $modsec (sort {$MOD_SEC{$b} <=> $MOD_SEC{$a}} (keys(%MOD_SEC))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$agent= $MOD_SEC{$modsec}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";

Open in new window

CERTIFIED EXPERT

Commented:
What kind on line are you trying to match?
Try:
/mod_security-message[:].*\.

Commented:
The following statement needs changed to
if (/mod_security-message:.*\./)

Open in new window

if (/(mod_security-message:.*\.)/)

Open in new window

or better yet
if (/mod_security-message:(.*)\./)

Open in new window

The parentheses tell perl to put the value between them into $1.
CERTIFIED EXPERT

Commented:
Correctly stated by schubach
But you still need [:] instead of :

Author

Commented:
works great only I don't necessarily need the "Access denied with code 200."

also I have this regex (/([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/) to extract the USER AGENT

eg.  Mozilla/4.0(compatible;MSIE 6.0: Windows NT 5.1)

However when I run my script it takes a very long time to complete when I include this part of this regex in my code

thanks again guys for your help
CERTIFIED EXPERT

Commented:
Try:
/([^(]+)\s*([^;]*);([^:]*)\W*([^)]*)/

If it is not what you want, please tell me what you need to extract

Author

Commented:
This what I am looking for
example:
Mozilla/4.0(compatible;MSIE 6.0: Windows NT 5.1)
CERTIFIED EXPERT

Commented:
This is the text to be parsed?
What do you want to extract from it?

Author

Commented:
I want to extract the following string from apache access_log

What is user’s browser type? Ex: Mozilla/4.0(compatible;MSIE 6.0: Windows NT 5.1)
CERTIFIED EXPERT

Commented:
Ok, I was thinking it to be the starting point.  But I need a sample Apache Access_log

Commented:
If you have access to POSIX::Regex CPAN module (since you're a student I'm not sure if you have permission to install specific CPAN modules on your box), then try the example POSIX regex found here.  http://www.texsoft.it/index.php?m=sw.php.useragent.  You can try from the command line
perl -e 'use POSIX::Regex';

Open in new window

to see if it is installed.  Google is my friend.

Author

Commented:
The regex you gave me /([^(]+)\s*([^;]*);([^:]*)\W*([^)]*)/ extracts
221.233.65.147 - - [13/Mar/2004:10:13:44 -0500] "CONNECT register.livesupportonthenet.com:443 HTTP/1.0" 200 - "-" "Mozilla/4.0 = 20  

The original regex that I have is if (/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/) and extracts
compatible; MSIE 6.0; Windows NT 5.1) Opera 7.21  [

The output is similar to what I am looking for but it takes forever to when I run it
I uploaded a sample access_log.
sample-access-log

Commented:
Please try this:
#!/usr/bin/perl
use strict;

open FI, "log.txt";
while (<FI>) {
chomp;
if (/["].*["].*["].*["].*["](.*)["]$/)
  {
    print "$1\n";
  }
}

Open in new window

Your example log file gives this as output from my code:
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)

Open in new window

Author

Commented:
Thanks. That worked great as well. regular expressions can be somewhat challenging.

Thus one seems a bit tricky, I have to extract from the audit log brute foce attacks examplle:
attacker (24.168.72.174) was trying to login using username: exodus, password: HELL
username: exodus9971, password: christ

this is a sample of the audit_log

========================================
Request: 24.168.72.174 - - [Tue Mar  9 22:27:46 2004] "GET http://sbc2.login.dcn.yahoo.com/config/login?.redir_from=PROFILES?&.tries=1&.src=jpg&.last=&promo=&.intl=us&.bypass=&.partner=&.chkP=Y&.done=http://jpager.yahoo.com/jpager/pager2.shtml&login=exodusc&passwd=HELL HTTP/1.0" 200 566
Handler: proxy-server
Error: mod_security: pausing [http://sbc2.login.dcn.yahoo.com/config/login?.redir_from=PROFILES?&.tries=1&.src=jpg&.last=&promo=&.intl=us&.bypass=&.partner=&.chkP=Y&.done=http://jpager.yahoo.com/jpager/pager2.shtml&login=exodusc&passwd=HELL] for 50000 ms
----------------------------------------
GET http://sbc2.login.dcn.yahoo.com/config/login?.redir_from=PROFILES?&.tries=1&.src=jpg&.last=&promo=&.intl=us&.bypass=&.partner=&.chkP=Y&.done=http://jpager.yahoo.com/jpager/pager2.shtml&login=exodusc&passwd=HELL HTTP/1.0
Accept: */*
Accept-Language: en
Connection: Keep-Alive
mod_security-message: Access denied with code 200. Pattern match "passwd=" at THE_REQUEST.
mod_security-action: 200

HTTP/1.0 200 OK
Connection: close

I tried the following regex but it only returned 1      = 3643818
if (/(\|||system\(|eval\(|`|\\)/i)
Commented:
What do you mean by:
I tried the following regex but it only returned 1      = 3643818
if (/(\|||system\(|eval\(|`|\\)/i)

How would that possibly extract a username, password, and IP address from this big string?  Please explain better what you want to do, and post a better example using the code tag.  Also, I think maybe this should be a new question and this question should be closed.  It seems like your original code snippet has now been fixed.

Author

Commented:
Ok. I will open a new post with a more detailed explanation.

Author

Commented:
excellent feedback

Explore More ContentExplore courses, solutions, and other research materials related to this topic.