Solved

Log File Parsing with Script

Posted on 2014-01-26
12
585 Views
Last Modified: 2014-01-26
Hello experts,

I am trying to figure how to parse this log file to a point I can import into a database table. Export to CSV would be helpful.

I can use Powershell, VBScript, or MS-DOS Batch.

Example Data:


111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "PROPFIND /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 207 255
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 115

I need to be able to remove the " - - ", " - ", and the last sets of numbers at the end.

In essence I just need the IPs, usernames, date, and paths (all info between the double quotes).
0
Comment
Question by:jb428j
  • 7
  • 5
12 Comments
 
LVL 29

Expert Comment

by:becraig
ID: 39810973
This might be high overhead but does what you want:
(gc file.txt) |%{$_.split('"')[1]} | out-file newfile.txt -append
0
 
LVL 1

Author Comment

by:jb428j
ID: 39810987
Yes, that does get the path information. Sorry if my question wasn't worded correctly. I also need the IP, username, the path which is the information between the double quotes. And need to remove the HTTP/1.1 within the double quotes. Finally, drop any line with " - - ".

Which is why a CSV export will help because of import to a database.

Overhead really isn't an issue honestly. It is a 4+ GB file and better than hand jamming. lol
0
 
LVL 29

Expert Comment

by:becraig
ID: 39810992
Ok got you one minute.
0
 
LVL 29

Expert Comment

by:becraig
ID: 39811024
(gc file.txt) -replace 'HTTP/1.1', '' |%{$_.split('"')[0,1] -join " "} | out-file newfile.txt -append


We can then clean it up

(gc newfile.txt | select-string -notmatch "--")  | out-file cleanfile.txt -append
0
 
LVL 1

Author Comment

by:jb428j
ID: 39811033
Working great so far. How can I go about keeping the IPs, usernames, and date at the beginning of the file.

Date being in a better format would be helpful, removing both [ ] brackets and the -500. Drop the first : with a space.

11/Dec/2013 07:24:09
0
 
LVL 29

Accepted Solution

by:
becraig earned 500 total points
ID: 39811046
 (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1]
 -join " "} | out-file newfile.txt -append

Open in new window


That replaces all the characters you need gone, you can change what is used for the join (if you want to), currently I have a space but you can replace with a comma or tab if that helps
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 29

Expert Comment

by:becraig
ID: 39811048
If you're gonna use this, delete the old output file created earlier so you do not have a dirty file, or rename the output file.
0
 
LVL 1

Author Comment

by:jb428j
ID: 39811062
It is getting there. Now there is a tough change that I need to do with the date.

111.111.111.11  user.name 11/Dec/2013:07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
But I need:

111.111.111.11  user.name 11/Dec/2013 07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
I can just had another column for time and the activity "Report" in the database to deal with column separation.

Maybe some sort of index, replace first : with space?
0
 
LVL 29

Assisted Solution

by:becraig
becraig earned 500 total points
ID: 39811068
Simply run this on the new file created:
(gc newfile.txt) | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt

Open in new window

0
 
LVL 1

Author Comment

by:jb428j
ID: 39811095
Works great!
0
 
LVL 1

Author Closing Comment

by:jb428j
ID: 39811099
Instead of dumping to a file and modding the newfile1, I did the following:

$Var = (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1.1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1] -join " "} 
$Var | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt 

Open in new window

0
 
LVL 29

Expert Comment

by:becraig
ID: 39811109
Great I only added the  second run since I figured you might have been running the first already.   Good stuff
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This script checks a path to see if a folder exists. If the folder does exist you will get output "The folder has previously been created. No action taken" If not it will create the folder. Then adds one user modify permission to the folder. It …
Create and license users in Office 365 in bulk based on a CSV file. A step-by-step guide with PowerShell script examples.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
This is a video that shows how the OnPage alerts system integrates into ConnectWise, how a trigger is set, how a page is sent via the trigger, and how the SENT, DELIVERED, READ & REPLIED receipts get entered into the internal tab of the ConnectWise …

948 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now