Link to home
Start Free TrialLog in
Avatar of jb428j
jb428j

asked on

Log File Parsing with Script

Hello experts,

I am trying to figure how to parse this log file to a point I can import into a database table. Export to CSV would be helpful.

I can use Powershell, VBScript, or MS-DOS Batch.

Example Data:


111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "PROPFIND /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 207 255
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 115

I need to be able to remove the " - - ", " - ", and the last sets of numbers at the end.

In essence I just need the IPs, usernames, date, and paths (all info between the double quotes).
Avatar of becraig
becraig
Flag of United States of America image

This might be high overhead but does what you want:
(gc file.txt) |%{$_.split('"')[1]} | out-file newfile.txt -append
Avatar of jb428j
jb428j

ASKER

Yes, that does get the path information. Sorry if my question wasn't worded correctly. I also need the IP, username, the path which is the information between the double quotes. And need to remove the HTTP/1.1 within the double quotes. Finally, drop any line with " - - ".

Which is why a CSV export will help because of import to a database.

Overhead really isn't an issue honestly. It is a 4+ GB file and better than hand jamming. lol
Ok got you one minute.
(gc file.txt) -replace 'HTTP/1.1', '' |%{$_.split('"')[0,1] -join " "} | out-file newfile.txt -append


We can then clean it up

(gc newfile.txt | select-string -notmatch "--")  | out-file cleanfile.txt -append
Avatar of jb428j

ASKER

Working great so far. How can I go about keeping the IPs, usernames, and date at the beginning of the file.

Date being in a better format would be helpful, removing both [ ] brackets and the -500. Drop the first : with a space.

11/Dec/2013 07:24:09
ASKER CERTIFIED SOLUTION
Avatar of becraig
becraig
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you're gonna use this, delete the old output file created earlier so you do not have a dirty file, or rename the output file.
Avatar of jb428j

ASKER

It is getting there. Now there is a tough change that I need to do with the date.

111.111.111.11  user.name 11/Dec/2013:07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
But I need:

111.111.111.11  user.name 11/Dec/2013 07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
I can just had another column for time and the activity "Report" in the database to deal with column separation.

Maybe some sort of index, replace first : with space?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jb428j

ASKER

Works great!
Avatar of jb428j

ASKER

Instead of dumping to a file and modding the newfile1, I did the following:

$Var = (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1.1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1] -join " "} 
$Var | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt 

Open in new window

Great I only added the  second run since I figured you might have been running the first already.   Good stuff