Solved

Log File Parsing with Script

Posted on 2014-01-26
12
589 Views
Last Modified: 2014-01-26
Hello experts,

I am trying to figure how to parse this log file to a point I can import into a database table. Export to CSV would be helpful.

I can use Powershell, VBScript, or MS-DOS Batch.

Example Data:


111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "PROPFIND /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 207 255
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 115

I need to be able to remove the " - - ", " - ", and the last sets of numbers at the end.

In essence I just need the IPs, usernames, date, and paths (all info between the double quotes).
0
Comment
Question by:jb428j
  • 7
  • 5
12 Comments
 
LVL 29

Expert Comment

by:becraig
ID: 39810973
This might be high overhead but does what you want:
(gc file.txt) |%{$_.split('"')[1]} | out-file newfile.txt -append
0
 
LVL 1

Author Comment

by:jb428j
ID: 39810987
Yes, that does get the path information. Sorry if my question wasn't worded correctly. I also need the IP, username, the path which is the information between the double quotes. And need to remove the HTTP/1.1 within the double quotes. Finally, drop any line with " - - ".

Which is why a CSV export will help because of import to a database.

Overhead really isn't an issue honestly. It is a 4+ GB file and better than hand jamming. lol
0
 
LVL 29

Expert Comment

by:becraig
ID: 39810992
Ok got you one minute.
0
Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
LVL 29

Expert Comment

by:becraig
ID: 39811024
(gc file.txt) -replace 'HTTP/1.1', '' |%{$_.split('"')[0,1] -join " "} | out-file newfile.txt -append


We can then clean it up

(gc newfile.txt | select-string -notmatch "--")  | out-file cleanfile.txt -append
0
 
LVL 1

Author Comment

by:jb428j
ID: 39811033
Working great so far. How can I go about keeping the IPs, usernames, and date at the beginning of the file.

Date being in a better format would be helpful, removing both [ ] brackets and the -500. Drop the first : with a space.

11/Dec/2013 07:24:09
0
 
LVL 29

Accepted Solution

by:
becraig earned 500 total points
ID: 39811046
 (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1]
 -join " "} | out-file newfile.txt -append

Open in new window


That replaces all the characters you need gone, you can change what is used for the join (if you want to), currently I have a space but you can replace with a comma or tab if that helps
0
 
LVL 29

Expert Comment

by:becraig
ID: 39811048
If you're gonna use this, delete the old output file created earlier so you do not have a dirty file, or rename the output file.
0
 
LVL 1

Author Comment

by:jb428j
ID: 39811062
It is getting there. Now there is a tough change that I need to do with the date.

111.111.111.11  user.name 11/Dec/2013:07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
But I need:

111.111.111.11  user.name 11/Dec/2013 07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
I can just had another column for time and the activity "Report" in the database to deal with column separation.

Maybe some sort of index, replace first : with space?
0
 
LVL 29

Assisted Solution

by:becraig
becraig earned 500 total points
ID: 39811068
Simply run this on the new file created:
(gc newfile.txt) | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt

Open in new window

0
 
LVL 1

Author Comment

by:jb428j
ID: 39811095
Works great!
0
 
LVL 1

Author Closing Comment

by:jb428j
ID: 39811099
Instead of dumping to a file and modding the newfile1, I did the following:

$Var = (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1.1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1] -join " "} 
$Var | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt 

Open in new window

0
 
LVL 29

Expert Comment

by:becraig
ID: 39811109
Great I only added the  second run since I figured you might have been running the first already.   Good stuff
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Create and license users in Office 365 in bulk based on a CSV file. A step-by-step guide with PowerShell script examples.
Previously, on our Nano Server Deployment series, we've created a new nano server image and deployed it on a physical server in part 2. Now we will go through configuration.
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question