Solved

Syslog text file into database or into .CSV

Posted on 2014-10-26
19
92 Views
Last Modified: 2016-06-07
I have a large Syslogchatchall text file and I want to convert it into database, I tried to open the file using Excel but all the information in a one log becomes one column. Is someone help me how I change that file into column based file. Thanks.
syslog.txt
0
Comment
Question by:Mohamed Mohamud
  • 8
  • 5
  • 3
  • +2
19 Comments
 
LVL 15

Expert Comment

by:Haris Djulic
ID: 40404909
Hello,

you need to use the Excel import option from text and then select fixed widht and there you using the separators set the length of the column and after you finish process you will have the data in multiple columns..

I attached screenshots of the process and result.
Capture1.PNG
Capture.PNG
0
 
LVL 5

Expert Comment

by:alicain
ID: 40404916
Hi,

This looks to be a TAB seperated file, there are two methods that you could use :

1. Open a bland spreadsheet and then import the data.  From the Data ribbon -> From File.... browse to the file ... Delimited -> select "Tab".

2. With the data opened in a spreadsheet an in one column, use the "Text to column" feature in Excel.

Regards,
Alastair.
0
 

Author Comment

by:Mohamed Mohamud
ID: 40405156
Haris Djulic and alicain thanks for your help, but the problem is like  proto=17, policy_id=52 and others are in the same column is there a way I can separate or write into database? Thanks again.
0
 
LVL 15

Expert Comment

by:Haris Djulic
ID: 40405209
Can you post sample file and desired output?
0
 
LVL 5

Expert Comment

by:alicain
ID: 40405248
If you specifyig both TAB and SPACE a delimiters (a tick against each on the import dialog) then Excel also places proto=17 and  policy_id=52 into seperate columns.

Have you come across LogParser?  It might be useful for what you are looking at doing.  There's an overhead in getting up to speed, but if you're familiar with SQL commands, that will help :
   http://technet.microsoft.com/en-gb/scriptcenter/dd919274.aspx


Regards,
Alastair.
0
 

Author Comment

by:Mohamed Mohamud
ID: 40415510
Dear Experts, my problem still not solved, my file is very big and excel cant handle the file it exceeds the limit and I want to import it to SQL DB so I can filter and run some quires on it, but the first 4 columns are separated with tab and  there is a long message witch also some times different depending whether it is notice or warring as the attached sample shows and I can't handle that part.
Is there also other way I can run quires as file remains text file with out importing into DB?
0
 
LVL 15

Expert Comment

by:Haris Djulic
ID: 40415562
Hello,

you can use SQL Server import wizard where you can choose to import from Text file and select that it is tab delimited and you can write query to narrow the number of imported rows to database..
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 40415675
Your comment on the "long message" indicates to me that we probably can't simply separate the fields based solely on the whitespace (tabs and spaces).

Please attach a new sample file which shows how/where you want the field separation.

You've tagged several different languages so I'm assuming that you know each of them.  In my case I'd use Perl to parse the log and insert the data.
0
 

Author Comment

by:Mohamed Mohamud
ID: 40416478
FishMonger I attached the file, but I appreciate SQL if possible.
Sample-file.rtf
sample.txt
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 28

Expert Comment

by:FishMonger
ID: 40416513
Your log file is not in the proper format to only use an sql statement to import it into the database fields you wish.  It will need to be parsed and either converted to csv for the sql import statement or the parsing script can load it without the need of a csv file.

How big is the file in both the number of lines and bytes (MB or GB)?
0
 

Author Comment

by:Mohamed Mohamud
ID: 40416531
It is 500MB, I can't open it but it is more than 100000 lines
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 40416570
Here's a sample perl script that will parse your log and generate a csv file which can be imported into the db via an sql statement.

#!/usr/bin/perl

use strict;
use warnings;

open my $in_fh, '<', 'syslog.txt' or die "failed to open 'syslog.txt' <$!>";
open my $out_fh, '>', 'syslog.csv' or die "failed to open 'syslog.csv' <$!>";

while (my $line = <$in_fh>) {
    chomp $line;
    if ($line =~ /^(\S+) \s+
                   (\S+) \s+
                   (\S+) \s+
                   (\S+) \s+
                   (\w+:\s\S+) \s+
                   (.+?) \s+
                   (.+) \s+
                   (reason.+)$
                 /xi) {
        print {$out_fh} join(',', $1,$2,$3,$4,$5,$6,$7,$8) . "\n";
    }
}
close $in_fh;
close $out_fh;

Open in new window


If some of the lines already contain a comma within any of the fields, then we would need to make a small adjustment to make sure they are escaped.

If you want, adding the db code to the script would be very easy.
0
 

Author Comment

by:Mohamed Mohamud
ID: 40417017
FishMonger;
Almost done, there is only one thing some fields and their content may become one column like these

 fuhfw: NetScreen,  device_id=fuhfw, reason=Creation<000>

fuhfw, device_id, reason are column names

also
SQL is not accepting this as column content:
,[Root]system-notification-00257(traffic): start_time="2014-10-21 00:10:26" duration=0 policy_id=52 service=dns proto=17 src zone=Trust dst zone=Untrust action=Permit sent=0 rcvd=0 src=10.8.114.161 dst=10.1.1.8 src_port=65128 dst_port=53 src-xlated ip=10.8.114.161 port=65128 dst-xlated ip=10.1.1.8 port=53 session_id=234313,
result.rtf
0
 
LVL 28

Accepted Solution

by:
FishMonger earned 500 total points
ID: 40417301
I suspected that the db import would fail due to the "loose" csv formatting.

I've added the use of the Text::CSV_XS module to fix the quoting of the fields and escaping of those quote marks within the field.

I also adjusted the regex to strip out those field names you mentiond.

#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new ({
                              binary         => 1,
                              eol            => $/,
                              always_quote   => 1,
                              escape_char    => "\\",
                              auto_diag      => 1
                            });

open my $in_fh, '<', 'syslog.txt' or die "failed to open 'syslog.txt' <$!>";
open my $out_fh, '>', 'syslog.csv' or die "failed to open 'syslog.csv' <$!>";

while (my $line = <$in_fh>) {

    if ($line =~ /^(\S+) \s+
                   (\S+) \s+
                   (\S+) \s+
                   (\S+) \s+
                   fuhfw:\s (\S+) \s+
                   device_id= (.+?) \s+
                   (.+) \s+
                   reason[:=] \s* (.+)
                 $/xi) {
        $csv->print( $out_fh, [ $1,$2,$3,$4,$5,$6,$7,$8 ] );
    }
}

close $in_fh;
close $out_fh;

Open in new window


If you need to post more samples, please use plain text files instead of the rtf file.  The rtf file makes it less clear on what the rows look like.
0
 

Author Comment

by:Mohamed Mohamud
ID: 40419251
Dear FishMonger;
One last thing, is it possible to create different column within different lines I as showed in the sample, I tried but it escapes the line that has less columns. Because I didn't get the information I need I want to split one line into more columns.
#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV_XS;



open my $in_fh, '<', 'c:\tijabo\am.txt' or die "failed to open 'c:\tijabo\am.txt' <$!>";
open my $out_fh, '>', 'c:\tijabo\am.csv.' or die "failed to open 'c:\tijabo\am.csv' <$!>";

while (my $line = <$in_fh>) {

    if ($line =~ /^(\S+) \s+
                   (\S+) \s+
                   (\S+) \s+
                   (\S+) \s+
                   fuhfw:\s (\S+) \s+
                   device_id= (.+?) \s+
                   duration= (.+?) \s+
                   policy_id= (.+?) \s+
                   service= (.+?) \s+
                   proto= (.+?) \s+                 
                   action= (.+?) \s+
                   sent= (.+?) \s+
                   rcvd= (.+?) \s+
                   src= (.+?) \s+
                   dst= (.+?) \s+
                   src_port= (.+?) \s+
                   dst_port= (.+?) \s+
                  
                   port= (.+?) \s+
                 
                   port= (.+?) \s+
                   session_id= (.+?) \s+
                  
                   reason[:=] \s* (.+)
                 $/xi) {
       print {$out_fh} join(',', $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) . "\n";
   }
 
}

close $in_fh;
close $out_fh;

Open in new window

also I got a problem with these  {src zone=, dst zone=} due to space
sample.txt
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 40419692
Creating a csv file with varying numbers of fields is a very very bad idea but you can do it with multiple regexs.  Be aware that you won't be able to do a direct import of that csv file into the db.

also I got a problem with these  {src zone=, dst zone=} due to space
What do you mean by that?
0
 

Author Comment

by:Mohamed Mohamud
ID: 40419770
I am aware of that thanks, could you please give me an example how to put  multiple regexs ?
I meant  this {src zone=, dst zone=} I couldn't put like this     action= (.+?) \s+     in the code to truncate them.
0
 

Author Comment

by:Mohamed Mohamud
ID: 40424412
In the attached sample there is three line which contains different fields I run this code
use strict;
use warnings;

open my $in_fh, '<', 'c:\tijabo\test.txt' 
or die "failed to open 'c:\tijabo\test.txt' <$!>";
open my $out_fh, '>', 'c:\tijabo\test.csv.' 
or die "failed to open 'c:\tijabo\test.csv' <$!>";

while (my $line = <$in_fh>) {
    if ($line =~ /^(\S+) \s+
                   (\S+) \s+                                                                       
                   service= (.+?) \s+
                   proto= (.+?) \s+     
                   port= (.+?) \s+                
                   session_id= (.+?) \s+
                   reason[:=] \s* (.+)
                 $/xi) {
       print {$out_fh} join(',', $1,$2,$3,$4,$5,$6,$7) . "\n"; 
       } 
      }

Open in new window

but it truncates the shorter line and displays this:

         21-10-2014,00:00:00,dns,17,65128,234313,Creation<000>
         21-10-2014,00:00:00,http,6,4334,249770,Close - TCP FIN<000>

But I want to display this:

       21-10-2014,00:00:00,dns,17,65128,234313,Creation<000>
       21-10-2014,00:00:00,REASON: BY_OTHER, , , ,
       21-10-2014,00:00:00,http,6,4334,249770,Close - TCP FIN<000>

Could you please help?
test.txt
0
 
LVL 9

Expert Comment

by:Suhas .
ID: 41640952
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Accept: FishMonger (http:#a40417301)

If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

suhasbharadwaj
Experts-Exchange Cleanup Volunteer
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Creating an analog clock UserControl seems fairly straight forward.  It is, after all, essentially just a circle with several lines in it!  Two common approaches for rendering an analog clock typically involve either manually calculating points with…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now