Solved

Compare fields from two different hash of hashes

Posted on 2009-03-30
10
479 Views
Last Modified: 2012-05-06
I have two apache logs, one with all POSTs and one with all GETs. I am trying to read each of them into their own array of hashes (or better way if you have a suggestion). My end result should be me comparing the two hashes, looking for IPs that match. If I find two records with the same IP, I then want to compare the User Agent, if those match, then compare the two times to see if they both happened within an hour time.

I am not loading my hashes properly so I can do these comparison checks. Please let me know where I am going wrong. The $hRequests isn't being created, and only one record is being returned.

This error pops up for every row.
Use of uninitialized value in hash element at ./script.pl line 71, <LOG> line 10069.



Each row is being place inside %data correctly, and the keys are the vars in the logformat var. (IE: '%h' is a key)
#!/usr/bin/perl -w

 

use Apache::LogRegex;

use Data::Dumper;

 

  my $lr;

my $log_format  = '"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""';

  eval { $lr = Apache::LogRegex->new($log_format) };

  die "Unable to parse log line: $@" if ($@);

  

my $get_logs = ("march-logs/march-get.txt",

				"march-logs/march-logs-web2/march-get.txt",

				"march-logs/march-logs-web3/march-get.txt");

				

my $post_logs = ("march-logs/march-post.txt",

				 "march-logs/march-logs-web2/march-post.txt",

				 "march-logs/march-logs-web3/march-post.txt");

 

my %data;

my %getRecords;

my $postRecords;

my @get_array;

my @post_array;

 

foreach ($get_logs)

{

	@get_array = &logToHash($_);

	foreach(@get_array)

	{

		print Dumper($_);	

	}

	

}

 

sub logToHash

{

	my $file = $_;

	open LOG, $file or die $!;

	our ($aRequests,$ip,$userAgent,$date,$hRequests,$host);

	

	while ( my $line_from_logfile = <LOG> ) 

  	{

      eval { %data = $lr->parse($line_from_logfile); };

      if (%data) 

      {

          # We have data to process

          while( my ($key, $value) = each(%data) ) 

          {

		  	if($key =~ '%h')

		  	{

		  		($host,$ip) = split(/:/, $value);

		  	}

		  	if($key =~ '%{User-Agent}i\""')

		  	{

		  		$userAgent = $value;

		  	}

		  	if($key =~ '%t')

		  	{

		  		$date = $value;

		  	}

          }

          $aRequests = $hRequests{$ip}{$userAgent}{$date};  ////LINE 71

   		  push @$aRequests, \%data;

   		}

  	} 

  	return @$aRequests;

}

Open in new window

0
Comment
Question by:hallikpapa
  • 7
  • 3
10 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24025084
Did you mean
my @get_logs = ("march-logs/march-get.txt",
                                "march-logs/march-logs-web2/march-get.txt",
                                "march-logs/march-logs-web3/march-get.txt");


foreach (@get_logs)
0
 

Author Comment

by:hallikpapa
ID: 24025087
Oops, yes, that should be that way.
0
 

Author Comment

by:hallikpapa
ID: 24025090
Problem still remains though
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 500 total points
ID: 24025104
where did %hRequests come from?  I don't see it defined
0
 

Author Comment

by:hallikpapa
ID: 24025139
Yeah it hasn't been defined. This is the whole app I posted. I am really rusty on my perl and can't seem to figure out how to accomplish my goal
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 84

Expert Comment

by:ozo
ID: 24025163
what goal are you trying to accomplish with that line?
0
 

Author Comment

by:hallikpapa
ID: 24025193
I am trying to make it my key, basically a HofHofHofH. Then I can use the IP, date, and UserAgent as keys, do comparisons.

And then push the whole row into that $aRequests array like:

push @$aRequests, \%data;

Is that right?

Then SOMEHOW compare on each var to see how many matches I have on each level of the hash, kind of like the code below. The loop below shows me doing each comparison one key at a time, and I will track the number of matches at each level. Hope that part makes sense.

BUT, I haven't gotten far enough to test the code below, because of the error I got in the original post and it doesn't load all the data correct so I can do comparisons between the two arrays.

If I am going down the wrong path, please let me know. The end goal is to be able to look for matching IPs between the GET and POST arrays, and for each IP match, check to see if those IPs have matching User Agents, and if they DO, then finally check to see if they both happened within an hour or two.





 while (my ($IP, $hUserAgents) = each(%hRequests)) {

      next if #IP is boring;

      while (my ($userAgent, $hDates = each(%$hUserAgents)) {

          next if #user agent is boring;

          while (my ($date, $aRequests) = each(%$hDates)) {

             #do something if date is in range

             #wanted for $IP, $userAgent

          }

      }

   }

Open in new window

0
 

Author Comment

by:hallikpapa
ID: 24029790
When I do the print Dumper line in the code below, it only prints one log entry. So something is not being pushed back correctly? I have switched it a bit. I am only searching GET requests, and going to try and do a match based on time stamp, then user agent against a table in the DB.

Am I going about this all wrong? Again, the end result should be to search the GET requests for a time stamp that falls within one hour of any request in the DB table. When I find a match, I want to check the user agent for that GET request and see if it also matches that same record in the DB table. If it does, yay. I am going to extract something from that GET request. If it doesn't, move on.

Please help, this is extremely frustrating.



foreach (@get_logs)

{

	@get_array = &logToHash($_);

}
 

foreach(@get_array)

	{

		print Dumper($_);	

	}
 
 

sub logToHash

{

	my $file = $_;

	my @AoH;

	open LOG, $file or die $!;

	our ($aRequests,$ip,$userAgent,$date,$hRequests,$host);

	

	while ( my $line_from_logfile = <LOG> ) 

  	{

      eval { %data = $lr->parse($line_from_logfile); };

      if (%data) 

      {

          # We have data to process

          while( my ($key, $value) = each(%data) ) 

          {

		  	if($key =~ '%{User-Agent}i\""')

		  	{

		  		$userAgent = $value;

		  	}

		  	if($key =~ '%t')

		  	{

		  		$date = $value;

		  	}

          }

          $aRequests = $hRequests{$date}{$userAgent};

   		  push @$aRequests, \%data;

   		}

  	} 

  	return @$aRequests;

}

Open in new window

0
 

Author Comment

by:hallikpapa
ID: 24030270
Thanks for the tips. I believe I am close, but not there yet. This section of code doesn't seem to be operating as I expect?

I am using eclipse and even though I breakpoint and see the hash keys %{User-Agent}i\"" & %t, those if statements are never satisfied. It loops through the entire hash, but the $key never changes from %{Referer}i.


What am I doing wrong?




      if (%data) 

      {

          # We have data to process

          while( my ($key, $value) = each(%data) ) 

          {

		  	if($key =~ '%{User-Agent}i\""')

		  	{

		  		$userAgent = $value;

		  	}

		  	if($key =~ '%t')

		  	{

		  		$date = $value;

		  	}

          }

          $aRequests = $hRequests{$date}{$userAgent};

   		  push @$aRequests, \%data;

   		}

Open in new window

0
 

Author Closing Comment

by:hallikpapa
ID: 31564644
I found my own solution
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.

912 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now