jb428j
asked on
Log File Parsing with Script
Hello experts,
I am trying to figure how to parse this log file to a point I can import into a database table. Export to CSV would be helpful.
I can use Powershell, VBScript, or MS-DOS Batch.
Example Data:
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "PROPFIND /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 207 255
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "REPORT /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk HTTP/1.1" 200 115
I need to be able to remove the " - - ", " - ", and the last sets of numbers at the end.
In essence I just need the IPs, usernames, date, and paths (all info between the double quotes).
I am trying to figure how to parse this log file to a point I can import into a database table. Export to CSV would be helpful.
I can use Powershell, VBScript, or MS-DOS Batch.
Example Data:
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "PROPFIND /repo/Goodrich-SISVT/767-2
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "REPORT /repo/Goodrich-SISVT/767-2
I need to be able to remove the " - - ", " - ", and the last sets of numbers at the end.
In essence I just need the IPs, usernames, date, and paths (all info between the double quotes).
ASKER
Yes, that does get the path information. Sorry if my question wasn't worded correctly. I also need the IP, username, the path which is the information between the double quotes. And need to remove the HTTP/1.1 within the double quotes. Finally, drop any line with " - - ".
Which is why a CSV export will help because of import to a database.
Overhead really isn't an issue honestly. It is a 4+ GB file and better than hand jamming. lol
Which is why a CSV export will help because of import to a database.
Overhead really isn't an issue honestly. It is a 4+ GB file and better than hand jamming. lol
Ok got you one minute.
(gc file.txt) -replace 'HTTP/1.1', '' |%{$_.split('"')[0,1] -join " "} | out-file newfile.txt -append
We can then clean it up
(gc newfile.txt | select-string -notmatch "--") | out-file cleanfile.txt -append
We can then clean it up
(gc newfile.txt | select-string -notmatch "--") | out-file cleanfile.txt -append
ASKER
Working great so far. How can I go about keeping the IPs, usernames, and date at the beginning of the file.
Date being in a better format would be helpful, removing both [ ] brackets and the -500. Drop the first : with a space.
11/Dec/2013 07:24:09
Date being in a better format would be helpful, removing both [ ] brackets and the -500. Drop the first : with a space.
11/Dec/2013 07:24:09
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If you're gonna use this, delete the old output file created earlier so you do not have a dirty file, or rename the output file.
ASKER
It is getting there. Now there is a tough change that I need to do with the date.
111.111.111.11 user.name 11/Dec/2013:07:24:09 REPORT /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk
But I need:
111.111.111.11 user.name 11/Dec/2013 07:24:09 REPORT /repo/Goodrich-SISVT/767-2 C_FQIS/S01 590-0101%2 0767-2C%20 BLSP/data/ trunk
I can just had another column for time and the activity "Report" in the database to deal with column separation.
Maybe some sort of index, replace first : with space?
111.111.111.11 user.name 11/Dec/2013:07:24:09 REPORT /repo/Goodrich-SISVT/767-2
111.111.111.11 user.name 11/Dec/2013 07:24:09 REPORT /repo/Goodrich-SISVT/767-2
Maybe some sort of index, replace first : with space?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Works great!
ASKER
Instead of dumping to a file and modding the newfile1, I did the following:
$Var = (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1.1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1] -join " "}
$Var | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt
Great I only added the second run since I figured you might have been running the first already. Good stuff
(gc file.txt) |%{$_.split('"')[1]} | out-file newfile.txt -append