Solved

Log File Parsing with Script

Posted on 2014-01-26
12
584 Views
Last Modified: 2014-01-26
Hello experts,

I am trying to figure how to parse this log file to a point I can import into a database table. Export to CSV would be helpful.

I can use Powershell, VBScript, or MS-DOS Batch.

Example Data:


111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "PROPFIND /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 207 255
111.111.111.11 - - [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 401 401
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 143
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "OPTIONS /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 96
111.111.111.11 - user.name [11/Dec/2013:07:24:09 -0500] "REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk HTTP/1.1" 200 115

I need to be able to remove the " - - ", " - ", and the last sets of numbers at the end.

In essence I just need the IPs, usernames, date, and paths (all info between the double quotes).
0
Comment
Question by:jb428j
  • 7
  • 5
12 Comments
 
LVL 28

Expert Comment

by:becraig
Comment Utility
This might be high overhead but does what you want:
(gc file.txt) |%{$_.split('"')[1]} | out-file newfile.txt -append
0
 
LVL 1

Author Comment

by:jb428j
Comment Utility
Yes, that does get the path information. Sorry if my question wasn't worded correctly. I also need the IP, username, the path which is the information between the double quotes. And need to remove the HTTP/1.1 within the double quotes. Finally, drop any line with " - - ".

Which is why a CSV export will help because of import to a database.

Overhead really isn't an issue honestly. It is a 4+ GB file and better than hand jamming. lol
0
 
LVL 28

Expert Comment

by:becraig
Comment Utility
Ok got you one minute.
0
 
LVL 28

Expert Comment

by:becraig
Comment Utility
(gc file.txt) -replace 'HTTP/1.1', '' |%{$_.split('"')[0,1] -join " "} | out-file newfile.txt -append


We can then clean it up

(gc newfile.txt | select-string -notmatch "--")  | out-file cleanfile.txt -append
0
 
LVL 1

Author Comment

by:jb428j
Comment Utility
Working great so far. How can I go about keeping the IPs, usernames, and date at the beginning of the file.

Date being in a better format would be helpful, removing both [ ] brackets and the -500. Drop the first : with a space.

11/Dec/2013 07:24:09
0
 
LVL 28

Accepted Solution

by:
becraig earned 500 total points
Comment Utility
 (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1]
 -join " "} | out-file newfile.txt -append

Open in new window


That replaces all the characters you need gone, you can change what is used for the join (if you want to), currently I have a space but you can replace with a comma or tab if that helps
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 28

Expert Comment

by:becraig
Comment Utility
If you're gonna use this, delete the old output file created earlier so you do not have a dirty file, or rename the output file.
0
 
LVL 1

Author Comment

by:jb428j
Comment Utility
It is getting there. Now there is a tough change that I need to do with the date.

111.111.111.11  user.name 11/Dec/2013:07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
But I need:

111.111.111.11  user.name 11/Dec/2013 07:24:09  REPORT /repo/Goodrich-SISVT/767-2C_FQIS/S01590-0101%20767-2C%20BLSP/data/trunk
I can just had another column for time and the activity "Report" in the database to deal with column separation.

Maybe some sort of index, replace first : with space?
0
 
LVL 28

Assisted Solution

by:becraig
becraig earned 500 total points
Comment Utility
Simply run this on the new file created:
(gc newfile.txt) | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt

Open in new window

0
 
LVL 1

Author Comment

by:jb428j
Comment Utility
Works great!
0
 
LVL 1

Author Closing Comment

by:jb428j
Comment Utility
Instead of dumping to a file and modding the newfile1, I did the following:

$Var = (gc file.txt | select-string -notmatch "- -") -replace ' HTTP/1.1', '' -replace '\[', '' -replace '\]', '' -replace '\[', '' -replace '-0500', ''| %{$_.split('"')[0,1] -join " "} 
$Var | % {$_.split(':',2)[0,1] -join " "} | out-file newfile.txt 

Open in new window

0
 
LVL 28

Expert Comment

by:becraig
Comment Utility
Great I only added the  second run since I figured you might have been running the first already.   Good stuff
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
A procedure for exporting installed hotfix details of remote computers using powershell
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now