?
Solved

Pattern Match

Posted on 2003-03-20
6
Medium Priority
?
195 Views
Last Modified: 2010-03-05
Hi, I have a log file that I need to parse. I found this script and tweaked it a little, I am a perl newbie and am really stuck.

Here is my log format:

user1.domain.local - - [17/Mar/2003:08:21:16 -0500] "GET http://wisapidata.weatherbug.com/WxDataISAPI/WxDataISAPI.cgi?GetCData&Magic=10991&RegNum=3098527&ZipCode=07054&StationID=PRSPP&Units=0&Version=3.5&Fore=1&t=1047907707&lv=0 HTTP/1.1" - - "-" "Mozilla/3.0 (compatible; MSIE 4.0; Win32)"

user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

The Script:

my $data = { };
while (<LOG>) {
        /^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
        my ($user, $url) = ($1, $3);
        $data->{$user}->{$url}++;
}

foreach my $user (keys %$data){
        print "User: $user\n\n";
        my $uref = $data->{$user};
        foreach my $url (keys %$uref) {
                print " $url (".$uref->{$url}." hits)\n";
                print "\n"; }
}

Yeild Sample:

User: user1.domain.local

 http://us.greet1.yimg.com/img.greetings.yahoo.com/g/img/rubber/trs_pat_ya02.gif (1 hits)
 
 http://us.i1.yimg.com/us.yimg.com/i/i16/mov_popc.gif (1 hits)
 
 http://www.ibc-uk.com/img/ILM/website.gif (1 hits)


How do I fix this to drop everything after the domain name in the report

Thank you for your help.
0
Comment
Question by:beerbar
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 8

Expert Comment

by:bebonham
ID: 8178648
Hi can you try this?
we'll keep all the data but print only what you need?

my $data = { };
while (<LOG>) {
       /^([\w\.]+) .*(GET|POST) (.*?) HTTP\//;
       my ($user, $url) = ($1, $3);
       $data->{$user}->{$url}++;
}

foreach my $user (keys %$data){
       print "User: $user\n\n";
       my $uref = $data->{$user};
       foreach my $url (keys %$uref) {
               print substr($url,0,index($url,"/",9)) . "(".$uref->{$url}." hits)\n";
               print "\n"; }
}
0
 
LVL 84

Accepted Solution

by:
ozo earned 200 total points
ID: 8178988
while (<LOG>) {
       next unless /^([\w\.]+) .*(GET|POST) ([^\/]*\/\/[^\/]*).* HTTP\//;
       my ($user, $url) = ($1, $3);
       $data->{$user}->{$url}++;
}
0
 
LVL 1

Author Comment

by:beerbar
ID: 8181268
Worked like a charm, thanks! Is there a way to drop all domains that are not ours?  I see that our log file contains inbound as well as outbound http requests, we just need the outbound or   domain.local stuff in the report.

Thank You again...
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 84

Expert Comment

by:ozo
ID: 8184386
next unless /^([\w\.]+) .*(GET|POST) ([^\/]*\/\/[^\/]*yimg\.com).* HTTP\//;
0
 
LVL 1

Author Comment

by:beerbar
ID: 8184995
Sorry about the confusion, I posted more info before but it must have made it to the bit bucket.

What I meant to say was users from our local domain

Yeild Sample:

User: user1.domain.local
    web site
    web site

In the output I see users are outside users as well as inside users. The actual url's listed are perfect. Below is user2 making an outbound http request to yaoo.com, but the server logs inbound as well so in the data we also have a user name that may be googlebot.google.com because their bot came in to our web server. I would only like to show *.domain.local or the IP address of 192.168.1.* as users if possible.

user2.domain.local - - [17/Mar/2003:08:21:18 -0500] "GET http://news.yahoo.com/news?tmpl=index2&cid=757 HTTP/1.1" - - "http://news.yahoo.com/" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT

Thanks again and again. Sorry about not being clear!
0
 
LVL 84

Expert Comment

by:ozo
ID: 8185390
next unless /^([\w.]+domain\.local) .*(GET|POST) ([^\/]*\/\/[^\/]*).* HTTP\//;
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question