We help IT Professionals succeed at work.
Get Started

reading apaache logs

SheldonC
SheldonC asked
on
338 Views
Last Modified: 2012-06-27
I am working on my computer project, I am not sure of the forum's position on aiding students with work. Its just that I am stuck on this question for a long time now and need to move forward since the deadline is approaching. I will upload the script I have thus far and the question I am trying to solve. If anyone can point me in the right direction I will be grateful.

Thanks

#/usr/bin/perl

use File::Basename;

#------------------------------------------------------------------------------#
#  Global variables that control the program action and output.                #
#------------------------------------------------------------------------------#

$NUM_RECS_TO_PRINT = 10;   # num of output recs to print per section

#---------------------------------------------------------------------#
#  Change this array to include index filenames used on your system.  #
#---------------------------------------------------------------------#

@indexFilenames = ('index.htm', 'index.html', 'index.shtml');


#----------------------------------------------------------------------#
# don't change anything below here unless you're comfortable with Perl #
#----------------------------------------------------------------------#

sub usage {
   print STDERR "\n\tUsage:  log2.pl access_log > output_file\n";
}


#----------------------------------------------------------#
#  These are two helper routines for the 'sort' function.  #
#----------------------------------------------------------#

sub fileNumericAscending {
   $numFileRequests{$a} <=> $numFileRequests{$b};
}

sub fileNumericDescending {
   $numFileRequests{$b} <=> $numFileRequests{$a};
}

sub trim($)
{
   my $string = shift;
   $string =~ s/^\s+//;
   $string =~ s/\s+$//;
   return $string;
}


#----------------------------<<   main   >>-----------------------------#

   #--------------------------------------------------------------------#
   #  Start by making sure the user is invoking this program properly.  #
   #--------------------------------------------------------------------#

   $numArgs = $#ARGV + 1;

   if ($numArgs != 1) {
      &usage;
      exit 1;
   }

   $logFile = $ARGV[0];

   open (LOGFILE,"access_log") || die "  Error opening log file $logFile.\n";

   #------------------------------------------------------------------#
   #  Start reading and processing the access_log file in this loop.  #
   #------------------------------------------------------------------#

   #printf "<pre>\n";
   while(<LOGFILE>)
   {


	if (/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/)
	{
		$REMOTE_IP{$1}++
	}
   



	 #if (/\b[^(\s)]*)$|([^(]+?)\s*(\(.*\)/)
#(/([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/)
  #(/(\b[^(\s)]*)$|([^(]+?)\s*(\(.*\))/)
 	#{

	#$USER_AGENT{$1}++
	#}



      chomp;

      #----------------------------------------------#
        #  condense one or more whitespace character   #
      #  to one single space                         #
      #----------------------------------------------#

      s/\s+/ /go;

      #----------------------------------------------------------#
      #  the next line breaks each line of the access_log into   #
      #  nine variables                                          #
      #----------------------------------------------------------#

      ($clientAddress,    $rfc1413,      $username, 
      $localTime,         $httpRequest,  $statusCode, 
      $bytesSentToClient, $referer,      $clientSoftware) =
      /^(\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;

      #--------------------------------------------------------------------#
      # take care of problem where the $httpRequest may simply be a hyphen #
      #--------------------------------------------------------------------#

      next if ($httpRequest =~ '^-$');

      #-----------------------------------------#
      #  Determine the value of $fileRequested  #
      #-----------------------------------------#

      ($getPost, $fileRequested, $junk) = split(' ', $httpRequest, 6);
	 ($getPost, $clientAddress, $junk) = split(' ', $clientAddress, 1);
     

      #-----------------------------------------------------------------#
      #  if the base filename is something like index.htm, index.html,  #
      #  or index.shtml, interpret this to be the same as the path by   #
      #  itself.  This way, '/java/' is the same as '/java/index.html'. #
      #-----------------------------------------------------------------#

      foreach $indexFile (@indexFilenames) {
        chomp($fileRequested);
        $fileRequested = trim($fileRequested);
        if ($fileRequested =~ /^\s+$/) {
           next;
        }
        if ($fileRequested =~ /^$/) {
           next;
        }
        if (basename($fileRequested) =~ /$indexFile/i) {
           $fileRequested = dirname($fileRequested);
           last;
        }
      }

      #----------------------------------------------------------------#
      #  If the last character in $fileRequested is a '/', remove it.  #
      #  This makes /perl/ equal to /perl.                             #
      #----------------------------------------------------------------#

      if (length($fileRequested) > 1) 
      {
        if (substr($fileRequested,length($fileRequested)-1,1) eq '/') 
        {
          chop($fileRequested);
        }
      }

      #-----------------------------------------------------#
      #  here's where we count the number of hits per file  #
      #-----------------------------------------------------#

      $numFileRequests{$fileRequested}++;



   }#end first while loop

   close (LOGFILE);






   #--------------------------------------#
   #  Output the number IPs  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT IP ADDRESSES:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $ip (sort {$REMOTE_IP{$b} <=> $REMOTE_IP{$a}} (keys(%REMOTE_IP))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$ip = $REMOTE_IP{$ip}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";




   #--------------------------------------#
   #  Output the number IPs  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT USER AGENTS:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $agent (sort {$USER_AGENT{$b} <=> $USER_AGENT{$a}} (keys(%USER_AGENT))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$agent= $USER_AGENT{$agent}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";




   #--------------------------------------#
   #  Output the number of hits per file  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT CONNECT REQUESTS:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach $key (sort fileNumericDescending (keys(%numFileRequests))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$numFileRequests{$key},$httpRequest{$key} \t\t $key\n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";





open (LOGFILE,"audit_log") || die "  Error opening log file $logFile.\n";
   #printf "<pre>\n";
   while (<LOGFILE>) {

if (/mod_security-message:.*\./)
{
$MOD_SEC{$1}++
}

}
 close (LOGFILE);



   #--------------------------------------#
   #  Output the number of hits per file  #
   #--------------------------------------#

   print "TOP $NUM_RECS_TO_PRINT PATTERN MATCH:\n";
   print "-----------------------------\n\n";
   $count=1;
   foreach my $modsec (sort {$MOD_SEC{$b} <=> $MOD_SEC{$a}} (keys(%MOD_SEC))) {
      last if ($count > $NUM_RECS_TO_PRINT);
      print "$count\t$agent= $MOD_SEC{$modsec}  \n";
	
      $count++;
   }
   print "\n\n";

   printf "</pre>\n";

Open in new window

   



This is the question I am stuck at
4. Search Logs for mod_security-message which is access denied by mod_security

When Mod_Security identifies a problem with a request due to a security violation, it will do two things – 1) Add in some additional client request headers stating why mod_security is taking action, and 2) Log this data to the audit_log and error_log files.  These error messages can be triggered by Mod_Security special checks such as the SecFilterCheckURLEncoding directive, basic filters such as “\.\.” to prevent directory traversals and advanced filters based on converted snort rules.

Search Logic: Search the audit_log entries that have the mod_security-message header, then sort the results, then only show unique entries with a total count of each type in reverse order from highest to lowest, then remove the mod_security-message data at the beginning of each line and list the Top 10 results.

      Your output will be similar to:

   1 51746 Pattern match "Basic" at HEADER.
   2 6138 Pattern match "passwd\=" at THE_REQUEST.
   3 5852 Pattern match "/search" at THE_REQUEST.
   4 5368 Pattern match "passwd=" at THE_REQUEST.
   5 4826 Pattern match "\.asp" at THE_REQUEST.
   6 3694 Pattern match "login.icq.com" at THE_REQUEST.
   7 1971 mod_security-message: Invalid character detected
   8 1935 Pattern match "/smartsearch\.cgi" at THE_REQUEST.
   9 1887 Pattern match "cmd\.exe" at THE_REQUEST.
  10 1387 Pattern match "/sh" at THE_REQUEST.
Comment
Watch Question
Commented:
This problem has been solved!
Unlock 1 Answer and 18 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE