Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Compare fields from two different hash of hashes

Posted on 2009-03-30
10
Medium Priority
?
501 Views
Last Modified: 2012-05-06
I have two apache logs, one with all POSTs and one with all GETs. I am trying to read each of them into their own array of hashes (or better way if you have a suggestion). My end result should be me comparing the two hashes, looking for IPs that match. If I find two records with the same IP, I then want to compare the User Agent, if those match, then compare the two times to see if they both happened within an hour time.

I am not loading my hashes properly so I can do these comparison checks. Please let me know where I am going wrong. The $hRequests isn't being created, and only one record is being returned.

This error pops up for every row.
Use of uninitialized value in hash element at ./script.pl line 71, <LOG> line 10069.



Each row is being place inside %data correctly, and the keys are the vars in the logformat var. (IE: '%h' is a key)
#!/usr/bin/perl -w
 
use Apache::LogRegex;
use Data::Dumper;
 
  my $lr;
my $log_format  = '"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""';
  eval { $lr = Apache::LogRegex->new($log_format) };
  die "Unable to parse log line: $@" if ($@);
  
my $get_logs = ("march-logs/march-get.txt",
				"march-logs/march-logs-web2/march-get.txt",
				"march-logs/march-logs-web3/march-get.txt");
				
my $post_logs = ("march-logs/march-post.txt",
				 "march-logs/march-logs-web2/march-post.txt",
				 "march-logs/march-logs-web3/march-post.txt");
 
my %data;
my %getRecords;
my $postRecords;
my @get_array;
my @post_array;
 
foreach ($get_logs)
{
	@get_array = &logToHash($_);
	foreach(@get_array)
	{
		print Dumper($_);	
	}
	
}
 
sub logToHash
{
	my $file = $_;
	open LOG, $file or die $!;
	our ($aRequests,$ip,$userAgent,$date,$hRequests,$host);
	
	while ( my $line_from_logfile = <LOG> ) 
  	{
      eval { %data = $lr->parse($line_from_logfile); };
      if (%data) 
      {
          # We have data to process
          while( my ($key, $value) = each(%data) ) 
          {
		  	if($key =~ '%h')
		  	{
		  		($host,$ip) = split(/:/, $value);
		  	}
		  	if($key =~ '%{User-Agent}i\""')
		  	{
		  		$userAgent = $value;
		  	}
		  	if($key =~ '%t')
		  	{
		  		$date = $value;
		  	}
          }
          $aRequests = $hRequests{$ip}{$userAgent}{$date};  ////LINE 71
   		  push @$aRequests, \%data;
   		}
  	} 
  	return @$aRequests;
}

Open in new window

0
Comment
Question by:hallikpapa
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 3
10 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 1500 total points
ID: 24025084
Did you mean
my @get_logs = ("march-logs/march-get.txt",
                                "march-logs/march-logs-web2/march-get.txt",
                                "march-logs/march-logs-web3/march-get.txt");


foreach (@get_logs)
0
 

Author Comment

by:hallikpapa
ID: 24025087
Oops, yes, that should be that way.
0
 

Author Comment

by:hallikpapa
ID: 24025090
Problem still remains though
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Assisted Solution

by:ozo
ozo earned 1500 total points
ID: 24025104
where did %hRequests come from?  I don't see it defined
0
 

Author Comment

by:hallikpapa
ID: 24025139
Yeah it hasn't been defined. This is the whole app I posted. I am really rusty on my perl and can't seem to figure out how to accomplish my goal
0
 
LVL 84

Expert Comment

by:ozo
ID: 24025163
what goal are you trying to accomplish with that line?
0
 

Author Comment

by:hallikpapa
ID: 24025193
I am trying to make it my key, basically a HofHofHofH. Then I can use the IP, date, and UserAgent as keys, do comparisons.

And then push the whole row into that $aRequests array like:

push @$aRequests, \%data;

Is that right?

Then SOMEHOW compare on each var to see how many matches I have on each level of the hash, kind of like the code below. The loop below shows me doing each comparison one key at a time, and I will track the number of matches at each level. Hope that part makes sense.

BUT, I haven't gotten far enough to test the code below, because of the error I got in the original post and it doesn't load all the data correct so I can do comparisons between the two arrays.

If I am going down the wrong path, please let me know. The end goal is to be able to look for matching IPs between the GET and POST arrays, and for each IP match, check to see if those IPs have matching User Agents, and if they DO, then finally check to see if they both happened within an hour or two.





 while (my ($IP, $hUserAgents) = each(%hRequests)) {
      next if #IP is boring;
      while (my ($userAgent, $hDates = each(%$hUserAgents)) {
          next if #user agent is boring;
          while (my ($date, $aRequests) = each(%$hDates)) {
             #do something if date is in range
             #wanted for $IP, $userAgent
          }
      }
   }

Open in new window

0
 

Author Comment

by:hallikpapa
ID: 24029790
When I do the print Dumper line in the code below, it only prints one log entry. So something is not being pushed back correctly? I have switched it a bit. I am only searching GET requests, and going to try and do a match based on time stamp, then user agent against a table in the DB.

Am I going about this all wrong? Again, the end result should be to search the GET requests for a time stamp that falls within one hour of any request in the DB table. When I find a match, I want to check the user agent for that GET request and see if it also matches that same record in the DB table. If it does, yay. I am going to extract something from that GET request. If it doesn't, move on.

Please help, this is extremely frustrating.



foreach (@get_logs)
{
	@get_array = &logToHash($_);
}
 
foreach(@get_array)
	{
		print Dumper($_);	
	}
 
 
sub logToHash
{
	my $file = $_;
	my @AoH;
	open LOG, $file or die $!;
	our ($aRequests,$ip,$userAgent,$date,$hRequests,$host);
	
	while ( my $line_from_logfile = <LOG> ) 
  	{
      eval { %data = $lr->parse($line_from_logfile); };
      if (%data) 
      {
          # We have data to process
          while( my ($key, $value) = each(%data) ) 
          {
		  	if($key =~ '%{User-Agent}i\""')
		  	{
		  		$userAgent = $value;
		  	}
		  	if($key =~ '%t')
		  	{
		  		$date = $value;
		  	}
          }
          $aRequests = $hRequests{$date}{$userAgent};
   		  push @$aRequests, \%data;
   		}
  	} 
  	return @$aRequests;
}

Open in new window

0
 

Author Comment

by:hallikpapa
ID: 24030270
Thanks for the tips. I believe I am close, but not there yet. This section of code doesn't seem to be operating as I expect?

I am using eclipse and even though I breakpoint and see the hash keys %{User-Agent}i\"" & %t, those if statements are never satisfied. It loops through the entire hash, but the $key never changes from %{Referer}i.


What am I doing wrong?



      if (%data) 
      {
          # We have data to process
          while( my ($key, $value) = each(%data) ) 
          {
		  	if($key =~ '%{User-Agent}i\""')
		  	{
		  		$userAgent = $value;
		  	}
		  	if($key =~ '%t')
		  	{
		  		$date = $value;
		  	}
          }
          $aRequests = $hRequests{$date}{$userAgent};
   		  push @$aRequests, \%data;
   		}

Open in new window

0
 

Author Closing Comment

by:hallikpapa
ID: 31564644
I found my own solution
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question