Solved

Parse and analyse log file with PERL

Posted on 2010-08-17
12
542 Views
Last Modified: 2012-05-10
Hi

I need to read a log file in this format (from squid proxy) :

1280628767.448 155251 192.168.100.225 TCP_MISS/503 902 GET http://googletb.skype.com/skypelist-en.xml renee DIRECT/googletb.skype.com text/html

and compile a list to a file  of everyone who've gone over their qouta - for example 1GB. In the format above the 902 is the usage in bytes and "renee" is the user in that line.

The log files tend to a bit weighty - 500mb+ so there are quite a few lines to work through.
thank you
0
Comment
Question by:QuintusSmit
  • 6
  • 6
12 Comments
 
LVL 10

Accepted Solution

by:
jeromee earned 500 total points
ID: 33457054
try this one-liner


perl -ane'$s{$F[7]}+=$F[4]; END{print map{"$_ $s{$_} ". ($s{$_}>1_000_000_000 ? "OVER\n" : "\n")} sort keys %s}' /proy/path

Open in new window

0
 
LVL 1

Author Comment

by:QuintusSmit
ID: 33457306
Hi jeromee - is the /proxy/path the path to the log file?
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33457380
correct.
0
 
LVL 1

Author Comment

by:QuintusSmit
ID: 33457431
I get an error: Cant find string terminator "'" anywhere before EOF at -e line 1.
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33458406
Are you sure that you copy the line verbatim?
Which version of Perl do you have? (perl -v )
0
 
LVL 1

Author Comment

by:QuintusSmit
ID: 33458717
thanx for the help so far.

this is the code as I use it: (copied and pasted)

perl -ane'$s{$F[7]}+=$F[4]; END{print map{"$_ $s{$_} ". ($s{$_}>1_000_000_000 ? "OVER\n" : "\n")} sort keys %s}'  c:/access.log

the perl version is 5.10.1

I am working on a 64 bit system if that makes a difference? Also I just thought I should mention im running this on a windows version of perl. I will try it now on linux and see if it works.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 1

Author Comment

by:QuintusSmit
ID: 33458866
that was the  problem - works great on linux server.

if you keep going like this minstrels will have to sing your praises soon.

Could you maybe give a quick overview of what it is I am actually doing with that line?

tx
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33459268
perl -ane'$s{$F[7]}+=$F[4]; END{print map{"$_ $s{$_} ". ($s{$_}>1_000_000_000 ? "OVER\n" : "\n")} sort keys %s}'

Perl -ane: see perl -h:
% perl -h

Usage: /home/Perl/bin/perl [switches] [--] [programfile] [arguments]
  -0[octal]       specify record separator (\0, if no argument)
  -a              autosplit mode with -n or -p (splits $_ into @F)
  -C              enable native wide character system interfaces
  -c              check syntax only (runs BEGIN and CHECK blocks)
  -d[:debugger]   run program under debugger
  -D[number/list] set debugging flags (argument is a bit mask or alphabets)
  -e 'command'    one line of program (several -e's allowed, omit programfile)
  -F/pattern/     split() pattern for -a switch (//'s are optional)
  -i[extension]   edit <> files in place (makes backup if extension supplied)
  -Idirectory     specify @INC/#include directory (several -I's allowed)
  -l[octal]       enable line ending processing, specifies line terminator
  -[mM][-]module  execute `use/no module...' before executing program
  -n              assume 'while (<>) { ... }' loop around program
  -p              assume loop like -n but print line also, like sed
  -P              run program through C preprocessor before compilation
  -s              enable rudimentary parsing for switches after programfile
  -S              look for programfile using PATH environment variable
  -T              enable tainting checks
  -u              dump core after parsing program
  -U              allow unsafe operations
  -v              print version, subversion (includes VERY IMPORTANT perl info)
  -V[:variable]   print configuration summary (or a single Config.pm variable)
  -w              enable many useful warnings (RECOMMENDED)
  -W              enable all warnings
  -X              disable all warnings
  -x[directory]   strip off text before #!perl line and perhaps cd to directory

For the rest
$s{$F[7]}+=$F[4]; # @F is an array that's automatically created when using -a (autosplit)
 the 7th place in the array is the username and the 4th is the number of bytes used
 %s is a hash table and I'm using add up for any given username, the amount of bytes used

 END{print map{"$_ $s{$_} ". ($s{$_}>1_000_000_000 ? "OVER\n" : "\n")} sort keys %s}'
After going thru all the lines of the file (END{...})
we want to print all the users and associated bytes used
sort keys %s provides the sorted list of all users and $s{$_} is the associated bytes used
 this  $s{$_}>1_000_000_000 ? "OVER\n" : "\n" is equivalent to:
    if( $s{$_} > 1_000_000_000 ) {
       add "OVER\n" to the line
    } else {
              add "\n" to the line
    }
and with the map statement is like a compact "foreach look"
like
    foreach $_ (sort keys %s) {
     print "$_ $s{$_}"....
    }

I hope that's slightly clearer.

Happy Perling!
   
0
 
LVL 1

Author Comment

by:QuintusSmit
ID: 33459307
uhuh - ofcourse..yes..now it all makes sense :)
i guess after all that typing you really want your points.

thank you for the help
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33459342
Sorry, I assumed that you had some knowledge of Perl and all you needed was for me to shed some light on the terseness of the one-liner.

In any case, I hope that I was at least able to demonstrate how powerful Perl can be.

Happy Perling!

 
0
 
LVL 1

Author Comment

by:QuintusSmit
ID: 33478447
hey - nah I was joking there - it actually made sense after the explanation... I just hate that you make it look so easy. I have very basic coding background and only recently started with perl. I didnt even know about one liners so this is a whole new world to me.

Thank you for the help
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33478827
One-liners can be very powerful and I suggest that you start collecting them like recipes in your own cookbook... then you can reuse them, combine them and adapt them to future uses.

Good luck!
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video discusses moving either the default database or any database to a new volume.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now