Link to home
Start Free TrialLog in
Avatar of boucherc
boucherc

asked on

delete line from file based on token criteria

I am trying to delete certain lines from my web server logs based on a certain criteria.

Here is a sample line from the logs:

2005-05-17 03:59:59 GET /applications/pwrdesk/templates/images/buttons/billing.gif - 443 - 68.163.145.40 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+5.5;+Windows+95) - https://avc01.onceanddone.com/applications/pwrdesk/pwrdesk_socket.pl?policy_number=1356310&policy_year=2004&page=iapw_p1a.html avc01.onceanddone.com 200 851 480 0

What I need to do is delete all lines that contain .gif or .jpg, unless they contain the status code of 40x (400, 401, 402, etc...) or 50x (500, 501, 502, etc...) the status code is the 4th from last token in the example. (in the example, the status code is 200).

The token that will contain the .gif or .jpg is the 14th. (dashes in the above example count as tokens.

Is there a way to do this?

Avatar of boucherc
boucherc

ASKER

Actually, the token that contains the .gif or .jpg is the 4th. The token that contains the status code is the 14th.
Avatar of FishMonger
This is an easy task if you want to use a Perl script.

I can write a quick and dirty script that has minimal error handling or with a little more effort, I can add in the proper file locking and error handling.  However, it’s not clear if the 2 tokens that you’re interested in will always be the 4th and 14th.
I'd go for the quick and dirty. And the 2 tokens would always be 4th and 14th. If a line has one specific token blank, it's substituted with a dash. ("-") as token #5 is in the above example.
>> If a line has one specific token blank, it's substituted with a dash.
What delimiter separates the tokens?  From the example line I’d say it’s a space, so if more than a single space would that constitute an empty token?
no, the tokens are space delimited. If all tokens were blank, it would look like:
- - - - - - - - - - - - - - - - -

so, if even tokens were blank, it would look like:

1 - 3 - 5 - 7 - 9 - 11 - 13 - 15 - 17
There actually shouldn't be more than a single space separating tokens.
#!perl -w

use strict;
use Tie::File;

my $log_file = 'C:/Program Files/Apache Group/logs/access.log';

# This next line will tie (link) a Perl array to the log file which means,
# when you modify the array, it's actually modifiing the file.
tie my @log_file, 'Tie:File', $log_file or die "Could not tie the array to the log file $log_file $!";

for my $i (0..$#log_file) {
   my @tokens = split /\s/, $log_file[$i];
   if (defined $tokens[$i]) && $tokens[3] =~ /(\.(gif|jpg)$/i && $tokens[13] =~ /^[45]0\d$/ ) {
      splice(@log_file, $i, 1);
   }
}
untie @log_file;
I should have explained
   splice(@log_file, $i, 1);
is the line that removes the array element which in turn removes that line from the file.

Keep in mind, when modifing files that other programs have write access to, you're in a race condition and may end up loosing some entries or get a corrupted file.  That's why you would normally get a write lock on the file before making any updates.  Since this is the quick and dirty script (actually it's half way inbetween), it doesn't include the file lock on the log file.

Here's the documentation on the Tie::File module which gives info on how to lock the file.
http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm

Additional modules and info for locking the file
http://search.cpan.org/~nwclark/perl-5.8.6/ext/Fcntl/Fcntl.pm
http://search.cpan.org/~muir/File-Flock-104.111901/lib/File/Flock.pm
I just noticed a slight goof on my part.
>> unless they contain the status code of 40x (400, 401, 402, etc...)

I missed 'unless' and instead thought 'and'

change
   $tokens[13] =~
to
   $tokens[13] !~
ASKER CERTIFIED SOLUTION
Avatar of FishMonger
FishMonger
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial