Link to home
Start Free TrialLog in
Avatar of beerbar
beerbar

asked on

Pattern Match

Hi, I have a log file that I need to parse. I found this script and tweaked it a little, I am a perl newbie and am really stuck.

Here is my log format:

user1.domain.local - - [17/Mar/2003:08:21:16 -0500] "GET http://wisapidata.weatherbug.com/WxDataISAPI/WxDataISAPI.cgi?GetCData&Magic=10991&RegNum=3098527&ZipCode=07054&StationID=PRSPP&Units=0&Version=3.5&Fore=1&t=1047907707&lv=0 HTTP/1.1" - - "-" "Mozilla/3.0 (compatible; MSIE 4.0; Win32)"

user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

The Script:

my $data = { };
while (<LOG>) {
        /^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
        my ($user, $url) = ($1, $3);
        $data->{$user}->{$url}++;
}

foreach my $user (keys %$data){
        print "User: $user\n\n";
        my $uref = $data->{$user};
        foreach my $url (keys %$uref) {
                print " $url (".$uref->{$url}." hits)\n";
                print "\n"; }
}

Yeild Sample:

User: user1.domain.local

 http://us.greet1.yimg.com/img.greetings.yahoo.com/g/img/rubber/trs_pat_ya02.gif (1 hits)
 
 http://us.i1.yimg.com/us.yimg.com/i/i16/mov_popc.gif (1 hits)
 
 http://www.ibc-uk.com/img/ILM/website.gif (1 hits)


How do I fix this to drop everything after the domain name in the report

Thank you for your help.
Avatar of bebonham
bebonham

Hi can you try this?
we'll keep all the data but print only what you need?

my $data = { };
while (<LOG>) {
       /^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
       my ($user, $url) = ($1, $3);
       $data->{$user}->{$url}++;
}

foreach my $user (keys %$data){
       print "User: $user\n\n";
       my $uref = $data->{$user};
       foreach my $url (keys %$uref) {
               print substr($url,0,index($url,"/",9)) . "(".$uref->{$url}." hits)\n";
               print "\n"; }
}
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of beerbar

ASKER

Worked like a charm, thanks! Is there a way to drop all domains that are not ours?  I see that our log file contains inbound as well as outbound http requests, we just need the outbound or   domain.local stuff in the report.

Thank You again...
next unless /^([\w\.]+) .*(GET|POST) ([^\/]*\/\/[^\/]*yimg\.com).* HTTP\//;
Avatar of beerbar

ASKER

Sorry about the confusion, I posted more info before but it must have made it to the bit bucket.

What I meant to say was users from our local domain

Yeild Sample:

User: user1.domain.local
    web site
    web site

In the output I see users are outside users as well as inside users. The actual url's listed are perfect. Below is user2 making an outbound http request to yaoo.com, but the server logs inbound as well so in the data we also have a user name that may be googlebot.google.com because their bot came in to our web server. I would only like to show *.domain.local or the IP address of 192.168.1.* as users if possible.

user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT

Thanks again and again. Sorry about not being clear!
next unless /^([\w.]+domain\.local) .*(GET|POST) ([^\/]*\/\/[^\/]*).* HTTP\//;