beerbar
asked on
Pattern Match
Hi, I have a log file that I need to parse. I found this script and tweaked it a little, I am a perl newbie and am really stuck.
Here is my log format:
user1.domain.local - - [17/Mar/2003:08:21:16 -0500] "GET http://wisapidata.weatherbug.com/WxDataISAPI/WxDataISAPI.cgi?GetCData&Magic=10991&RegNum=3098527&ZipCode=07054&StationID=PRSPP&Units=0&Version=3.5&Fore=1&t=1047907707&lv=0 HTTP/1.1" - - "-" "Mozilla/3.0 (compatible; MSIE 4.0; Win32)"
user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
The Script:
my $data = { };
while (<LOG>) {
/^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
my ($user, $url) = ($1, $3);
$data->{$user}->{$url}++;
}
foreach my $user (keys %$data){
print "User: $user\n\n";
my $uref = $data->{$user};
foreach my $url (keys %$uref) {
print " $url (".$uref->{$url}." hits)\n";
print "\n"; }
}
Yeild Sample:
User: user1.domain.local
http://us.greet1.yimg.com/img.greetings.yahoo.com/g/img/rubber/trs_pat_ya02.gif (1 hits)
http://us.i1.yimg.com/us.yimg.com/i/i16/mov_popc.gif (1 hits)
http://www.ibc-uk.com/img/ILM/website.gif (1 hits)
How do I fix this to drop everything after the domain name in the report
Thank you for your help.
Here is my log format:
user1.domain.local - - [17/Mar/2003:08:21:16 -0500] "GET http://wisapidata.weatherbug.com/WxDataISAPI/WxDataISAPI.cgi?GetCData&Magic=10991&RegNum=3098527&ZipCode=07054&StationID=PRSPP&Units=0&Version=3.5&Fore=1&t=1047907707&lv=0 HTTP/1.1" - - "-" "Mozilla/3.0 (compatible; MSIE 4.0; Win32)"
user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
The Script:
my $data = { };
while (<LOG>) {
/^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
my ($user, $url) = ($1, $3);
$data->{$user}->{$url}++;
}
foreach my $user (keys %$data){
print "User: $user\n\n";
my $uref = $data->{$user};
foreach my $url (keys %$uref) {
print " $url (".$uref->{$url}." hits)\n";
print "\n"; }
}
Yeild Sample:
User: user1.domain.local
http://us.greet1.yimg.com/img.greetings.yahoo.com/g/img/rubber/trs_pat_ya02.gif (1 hits)
http://us.i1.yimg.com/us.yimg.com/i/i16/mov_popc.gif (1 hits)
http://www.ibc-uk.com/img/ILM/website.gif (1 hits)
How do I fix this to drop everything after the domain name in the report
Thank you for your help.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Worked like a charm, thanks! Is there a way to drop all domains that are not ours? I see that our log file contains inbound as well as outbound http requests, we just need the outbound or domain.local stuff in the report.
Thank You again...
Thank You again...
next unless /^([\w\.]+) .*(GET|POST) ([^\/]*\/\/[^\/]*yimg\.com ).* HTTP\//;
ASKER
Sorry about the confusion, I posted more info before but it must have made it to the bit bucket.
What I meant to say was users from our local domain
Yeild Sample:
User: user1.domain.local
web site
web site
In the output I see users are outside users as well as inside users. The actual url's listed are perfect. Below is user2 making an outbound http request to yaoo.com, but the server logs inbound as well so in the data we also have a user name that may be googlebot.google.com because their bot came in to our web server. I would only like to show *.domain.local or the IP address of 192.168.1.* as users if possible.
user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT
Thanks again and again. Sorry about not being clear!
What I meant to say was users from our local domain
Yeild Sample:
User: user1.domain.local
web site
web site
In the output I see users are outside users as well as inside users. The actual url's listed are perfect. Below is user2 making an outbound http request to yaoo.com, but the server logs inbound as well so in the data we also have a user name that may be googlebot.google.com because their bot came in to our web server. I would only like to show *.domain.local or the IP address of 192.168.1.* as users if possible.
user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT
Thanks again and again. Sorry about not being clear!
next unless /^([\w.]+domain\.local) .*(GET|POST) ([^\/]*\/\/[^\/]*).* HTTP\//;
we'll keep all the data but print only what you need?
my $data = { };
while (<LOG>) {
/^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
my ($user, $url) = ($1, $3);
$data->{$user}->{$url}++;
}
foreach my $user (keys %$data){
print "User: $user\n\n";
my $uref = $data->{$user};
foreach my $url (keys %$uref) {
print substr($url,0,index($url,"
print "\n"; }
}