• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 340
  • Last Modified:

delete line from file based on token criteria

I am trying to delete certain lines from my web server logs based on a certain criteria.

Here is a sample line from the logs:

2005-05-17 03:59:59 GET /applications/pwrdesk/templates/images/buttons/billing.gif - 443 - 68.163.145.40 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+5.5;+Windows+95) - https://avc01.onceanddone.com/applications/pwrdesk/pwrdesk_socket.pl?policy_number=1356310&policy_year=2004&page=iapw_p1a.html avc01.onceanddone.com 200 851 480 0

What I need to do is delete all lines that contain .gif or .jpg, unless they contain the status code of 40x (400, 401, 402, etc...) or 50x (500, 501, 502, etc...) the status code is the 4th from last token in the example. (in the example, the status code is 200).

The token that will contain the .gif or .jpg is the 14th. (dashes in the above example count as tokens.

Is there a way to do this?

0
boucherc
Asked:
boucherc
  • 6
  • 4
1 Solution
 
bouchercAuthor Commented:
Actually, the token that contains the .gif or .jpg is the 4th. The token that contains the status code is the 14th.
0
 
FishMongerCommented:
This is an easy task if you want to use a Perl script.

I can write a quick and dirty script that has minimal error handling or with a little more effort, I can add in the proper file locking and error handling.  However, it’s not clear if the 2 tokens that you’re interested in will always be the 4th and 14th.
0
 
bouchercAuthor Commented:
I'd go for the quick and dirty. And the 2 tokens would always be 4th and 14th. If a line has one specific token blank, it's substituted with a dash. ("-") as token #5 is in the above example.
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
FishMongerCommented:
>> If a line has one specific token blank, it's substituted with a dash.
What delimiter separates the tokens?  From the example line I’d say it’s a space, so if more than a single space would that constitute an empty token?
0
 
bouchercAuthor Commented:
no, the tokens are space delimited. If all tokens were blank, it would look like:
- - - - - - - - - - - - - - - - -

so, if even tokens were blank, it would look like:

1 - 3 - 5 - 7 - 9 - 11 - 13 - 15 - 17
0
 
bouchercAuthor Commented:
There actually shouldn't be more than a single space separating tokens.
0
 
FishMongerCommented:
#!perl -w

use strict;
use Tie::File;

my $log_file = 'C:/Program Files/Apache Group/logs/access.log';

# This next line will tie (link) a Perl array to the log file which means,
# when you modify the array, it's actually modifiing the file.
tie my @log_file, 'Tie:File', $log_file or die "Could not tie the array to the log file $log_file $!";

for my $i (0..$#log_file) {
   my @tokens = split /\s/, $log_file[$i];
   if (defined $tokens[$i]) && $tokens[3] =~ /(\.(gif|jpg)$/i && $tokens[13] =~ /^[45]0\d$/ ) {
      splice(@log_file, $i, 1);
   }
}
untie @log_file;
0
 
FishMongerCommented:
I should have explained
   splice(@log_file, $i, 1);
is the line that removes the array element which in turn removes that line from the file.

Keep in mind, when modifing files that other programs have write access to, you're in a race condition and may end up loosing some entries or get a corrupted file.  That's why you would normally get a write lock on the file before making any updates.  Since this is the quick and dirty script (actually it's half way inbetween), it doesn't include the file lock on the log file.

Here's the documentation on the Tie::File module which gives info on how to lock the file.
http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm

Additional modules and info for locking the file
http://search.cpan.org/~nwclark/perl-5.8.6/ext/Fcntl/Fcntl.pm
http://search.cpan.org/~muir/File-Flock-104.111901/lib/File/Flock.pm
0
 
FishMongerCommented:
I just noticed a slight goof on my part.
>> unless they contain the status code of 40x (400, 401, 402, etc...)

I missed 'unless' and instead thought 'and'

change
   $tokens[13] =~
to
   $tokens[13] !~
0
 
FishMongerCommented:
I was just going back through some of the questions I've posted in and I see I made another goof.

   defined $tokens[$i]
should be
   defined $log_file[$i])
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now