Solved

Compare fields from two different hash of hashes

Posted on 2009-03-30
10
476 Views
Last Modified: 2012-05-06
I have two apache logs, one with all POSTs and one with all GETs. I am trying to read each of them into their own array of hashes (or better way if you have a suggestion). My end result should be me comparing the two hashes, looking for IPs that match. If I find two records with the same IP, I then want to compare the User Agent, if those match, then compare the two times to see if they both happened within an hour time.

I am not loading my hashes properly so I can do these comparison checks. Please let me know where I am going wrong. The $hRequests isn't being created, and only one record is being returned.

This error pops up for every row.
Use of uninitialized value in hash element at ./script.pl line 71, <LOG> line 10069.



Each row is being place inside %data correctly, and the keys are the vars in the logformat var. (IE: '%h' is a key)
#!/usr/bin/perl -w

 

use Apache::LogRegex;

use Data::Dumper;

 

  my $lr;

my $log_format  = '"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""';

  eval { $lr = Apache::LogRegex->new($log_format) };

  die "Unable to parse log line: $@" if ($@);

  

my $get_logs = ("march-logs/march-get.txt",

				"march-logs/march-logs-web2/march-get.txt",

				"march-logs/march-logs-web3/march-get.txt");

				

my $post_logs = ("march-logs/march-post.txt",

				 "march-logs/march-logs-web2/march-post.txt",

				 "march-logs/march-logs-web3/march-post.txt");

 

my %data;

my %getRecords;

my $postRecords;

my @get_array;

my @post_array;

 

foreach ($get_logs)

{

	@get_array = &logToHash($_);

	foreach(@get_array)

	{

		print Dumper($_);	

	}

	

}

 

sub logToHash

{

	my $file = $_;

	open LOG, $file or die $!;

	our ($aRequests,$ip,$userAgent,$date,$hRequests,$host);

	

	while ( my $line_from_logfile = <LOG> ) 

  	{

      eval { %data = $lr->parse($line_from_logfile); };

      if (%data) 

      {

          # We have data to process

          while( my ($key, $value) = each(%data) ) 

          {

		  	if($key =~ '%h')

		  	{

		  		($host,$ip) = split(/:/, $value);

		  	}

		  	if($key =~ '%{User-Agent}i\""')

		  	{

		  		$userAgent = $value;

		  	}

		  	if($key =~ '%t')

		  	{

		  		$date = $value;

		  	}

          }

          $aRequests = $hRequests{$ip}{$userAgent}{$date};  ////LINE 71

   		  push @$aRequests, \%data;

   		}

  	} 

  	return @$aRequests;

}

Open in new window

0
Comment
Question by:hallikpapa
  • 7
  • 3
10 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
Comment Utility
Did you mean
my @get_logs = ("march-logs/march-get.txt",
                                "march-logs/march-logs-web2/march-get.txt",
                                "march-logs/march-logs-web3/march-get.txt");


foreach (@get_logs)
0
 

Author Comment

by:hallikpapa
Comment Utility
Oops, yes, that should be that way.
0
 

Author Comment

by:hallikpapa
Comment Utility
Problem still remains though
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 500 total points
Comment Utility
where did %hRequests come from?  I don't see it defined
0
 

Author Comment

by:hallikpapa
Comment Utility
Yeah it hasn't been defined. This is the whole app I posted. I am really rusty on my perl and can't seem to figure out how to accomplish my goal
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 84

Expert Comment

by:ozo
Comment Utility
what goal are you trying to accomplish with that line?
0
 

Author Comment

by:hallikpapa
Comment Utility
I am trying to make it my key, basically a HofHofHofH. Then I can use the IP, date, and UserAgent as keys, do comparisons.

And then push the whole row into that $aRequests array like:

push @$aRequests, \%data;

Is that right?

Then SOMEHOW compare on each var to see how many matches I have on each level of the hash, kind of like the code below. The loop below shows me doing each comparison one key at a time, and I will track the number of matches at each level. Hope that part makes sense.

BUT, I haven't gotten far enough to test the code below, because of the error I got in the original post and it doesn't load all the data correct so I can do comparisons between the two arrays.

If I am going down the wrong path, please let me know. The end goal is to be able to look for matching IPs between the GET and POST arrays, and for each IP match, check to see if those IPs have matching User Agents, and if they DO, then finally check to see if they both happened within an hour or two.





 while (my ($IP, $hUserAgents) = each(%hRequests)) {

      next if #IP is boring;

      while (my ($userAgent, $hDates = each(%$hUserAgents)) {

          next if #user agent is boring;

          while (my ($date, $aRequests) = each(%$hDates)) {

             #do something if date is in range

             #wanted for $IP, $userAgent

          }

      }

   }

Open in new window

0
 

Author Comment

by:hallikpapa
Comment Utility
When I do the print Dumper line in the code below, it only prints one log entry. So something is not being pushed back correctly? I have switched it a bit. I am only searching GET requests, and going to try and do a match based on time stamp, then user agent against a table in the DB.

Am I going about this all wrong? Again, the end result should be to search the GET requests for a time stamp that falls within one hour of any request in the DB table. When I find a match, I want to check the user agent for that GET request and see if it also matches that same record in the DB table. If it does, yay. I am going to extract something from that GET request. If it doesn't, move on.

Please help, this is extremely frustrating.



foreach (@get_logs)

{

	@get_array = &logToHash($_);

}
 

foreach(@get_array)

	{

		print Dumper($_);	

	}
 
 

sub logToHash

{

	my $file = $_;

	my @AoH;

	open LOG, $file or die $!;

	our ($aRequests,$ip,$userAgent,$date,$hRequests,$host);

	

	while ( my $line_from_logfile = <LOG> ) 

  	{

      eval { %data = $lr->parse($line_from_logfile); };

      if (%data) 

      {

          # We have data to process

          while( my ($key, $value) = each(%data) ) 

          {

		  	if($key =~ '%{User-Agent}i\""')

		  	{

		  		$userAgent = $value;

		  	}

		  	if($key =~ '%t')

		  	{

		  		$date = $value;

		  	}

          }

          $aRequests = $hRequests{$date}{$userAgent};

   		  push @$aRequests, \%data;

   		}

  	} 

  	return @$aRequests;

}

Open in new window

0
 

Author Comment

by:hallikpapa
Comment Utility
Thanks for the tips. I believe I am close, but not there yet. This section of code doesn't seem to be operating as I expect?

I am using eclipse and even though I breakpoint and see the hash keys %{User-Agent}i\"" & %t, those if statements are never satisfied. It loops through the entire hash, but the $key never changes from %{Referer}i.


What am I doing wrong?




      if (%data) 

      {

          # We have data to process

          while( my ($key, $value) = each(%data) ) 

          {

		  	if($key =~ '%{User-Agent}i\""')

		  	{

		  		$userAgent = $value;

		  	}

		  	if($key =~ '%t')

		  	{

		  		$date = $value;

		  	}

          }

          $aRequests = $hRequests{$date}{$userAgent};

   		  push @$aRequests, \%data;

   		}

Open in new window

0
 

Author Closing Comment

by:hallikpapa
Comment Utility
I found my own solution
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now