Solved

Help with SQL Statement

Posted on 2009-04-06
16
456 Views
Last Modified: 2012-05-06
Hello, I need help with a particular SQL statement.  I'm trying to search through my log files using Logparser and I'm not to sure about the syntax for what I'm needing.  I need to be able to parse through my log file and for it to check each user and their IP address, and if there is a variation in user IP then it would note that in my output.

This is what I have so far for my SQL statement.  Forget the fact that I'm using Logparser, I don't want to complicate the question any further.  Just concentrate on the SQL statement.

"SELECT * FROM \\mydc\logs\myfile.csv WHERE EVTID = '540' AND DOMAIN = 'TESTDC' AND LOGONTYPE = '3'"

   Any help with this statement would be greatly appreciated.  Thanks.


0
Comment
Question by:itsmevic
  • 8
  • 7
16 Comments
 
LVL 28

Expert Comment

by:TextReport
ID: 24085893
I suspect reading around that you cant just select from teh text file but you need to open the file and read each line, check out the code in http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/ASP/Q_21385446.html
Cheers, Andrew
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24087629
Well in MS SQL, you cannot do that straight away, you need some method that SQL can use to actually read that file...

The single biggest difficulty with CSV files is the tool used to create them. Not all tools do the same job, and in fact, the same tool might not even do the same thing each time, or even row by row...

In terms of predictable outputs, possibly one of the biggest offenders is Excel... but, it is not alone...

For example, if a string does not contain any special characters like a quote or a comma, then it will be a comma delimited field in the output, no dramas. That same column, on the very next row, could in fact look quite different to an import facility such as bulk insert.

For example, a formatted number column like currency, may have $1.32 on one line and $1,320.00 on another. Exported as CSV - it will most likely turn out to be $1.32 (no problems can insert money), and "$1,320.00" which is a problem - two reasons, first those quotes, and second there is a comma embedded (hence the quotes) - clearly not a money field, and worse, will be treated as two separate columns, the first one being "$1 and the second 320.00" (assuming we take comma as the delimiter). And why isnt it exported as a number anyway ?

If every row had every column formatted the same way, then we can readily take care of it, but when individual rows have variants from other rows, then it becomes a nightmare. You can use a linked server or data provider to access, and is a lot more tolerant of columns, but then you MUST have column names as the very first row if you plan to use that method, and in some cases, the actual column headings might be missing, or on row two, and in those cases, the data provider / linked server does not work for you.

So, the difficulty is not only recognising the encapsulated field (ie the Quotes), but also the reason why it became encapsulated (ie the comma), and the alignment of columns as a result. You cannot treat one without knowing about the other.

In an ideal world, all string variants would be encapsulated - then we can simply remove the first and last character, however the ideal world and the real world are in fact worlds apart.

For example, a street address like Level 14, 15 Main Street would export as "Level 14, 15 Main Street" - no problems, remove the first and last quote. But an address like Level 14, "Castle Towers"  would export like "Level 14, ""Castle Towers""" - leaving us with two sets of double quotes, now there are really erroneous quotes.

In order for SQL to do anything with an import file, it needs to be able to read the data first, and here's the catch, it cannot read the data unless it does know what to do with the columns, and in some situations, must also know the column names. For that reason, it is normally a very good idea to use a staging table. Make each column quite "open" (ie varchar) and perform validation / manipulation on the staging table before attacking the database...

This is where we begin to diversify the possible solutions... and the possible tools to use...

1) if origins are excel, then leave as excel and open it directly - avoid the whole csv problem
2) if csv and every column is presented the same way, with quote encapsulation and column heading as the first line then can open via linked server
3) if csv and every column is presented the same way, with quote encapsulation, then use a format file
4) if possible, change the source data so it will comply - though this is next to impossible if mere mortals (ie users) are in charge of building the data.
5) manually unpack the record using some form of "unstringing"

Now some of these solutions may well depend on the size of data - mainly in terms of the number of columns more so than rows...

1) bulk insert - is just that, assumes a lot about incoming data - not least of which is that it already "complies". So unless a field contravenes the type cast, it will be imported, and it does not care if there are more columns that expected (such as the embedded quote) it will simply import delimiting by the fieldterminator. Good for very well structured and consistently formatted data - typically machine generated.
2) openrowset using a format file, good for column by column and row by row operations, still a small problem with delimiters, by the format file can help as long as there is consistency throughout each column row by row
3) linked server and opendatasource - linking directly to the data source via a data provider. Best way for excel - bypass all those csv type problems...
4) using one of the bulk insert or openrowsets or bcp using a format file to import an entire line as a column, then unpack that column - most flexible, but most effort and performance impacts depending on number of columns needing to be unpacked... The one method most likely to succeed no matter what when end-user generated data is involved.

So, some examples... (sql2005)

For the purposes of demonstrating the last two methods of the above, there are a few attachments to save in the c:\ root drive (or where ever - just change the path accordingly)... Also be aware that attached files often replace the underscore for a hyphen - need to save with any corrections - especially the format file - it has to be attached as .txt and needs to be .fmt.

And for the purposes of example, not actually updating any real tables - but, will display the individual columns where / as appropriate - and if we can display them, we can update or insert them...


 

-- first the data provider method - use a linked server directly to the spreadsheet - by far the preferred method... and in office 2007 might need the ACE drivers not JET.
 

EXEC sp_addlinkedserver MyExcel, \'Jet 4.0','Microsoft.Jet.OLEDB.4.0','c:\order_worksheet.xls', NULL, 'Excel 5.0;'

GO
 

--Set up login mappings (just ADMIN - jet wants something).

EXEC sp_addlinkedsrvlogin MyExcel, FALSE, NULL, Admin, NULL

GO
 

--List the tables in the linked server (these are the worksheet names).

EXEC sp_tables_ex MyExcel

GO
 

--and now look at the "table" - pretty much the same as any table...

select * from myexcel...Order_Worksheet$

GO
 

-- now do whatever you want with the data using the linked server just like any table

-- oh, and by the way, you can update or insert into excel this way as well...
 

-- remove server

sp_dropserver 'MyExcel', 'droplogins';

GO
 
 
 

-- and using providers directly as a "dynamic" linked server - opens Excel directly, then there are a couple of choices...
 

select * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0','Excel 8.0;database=c:\order_worksheet.xls;hdr=yes',' select * from [order_worksheet$]') as a
 

Select * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0;Database=c:\order_worksheet.xlsx;HDR=Yes', 'SELECT * FROM [order_worksheet$]') as a         -- just to show office 2007 syntax
 

SELECT * FROM OPENDATASOURCE( 'Microsoft.Jet.OLEDB.4.0', 'Data Source="c:\order_worksheet.xls"; Extended properties=Excel 8.0')...order_worksheet$
 
 
 
 

-- or, use a linked server to the CSV file itself... But it MUST have column headings, and be a very, very, clean format using this method
 

EXEC sp_addlinkedserver MyCSV, 'Jet 4.0', 'Microsoft.Jet.OLEDB.4.0','c:\', NULL,'Text'

GO

EXEC sp_addlinkedsrvlogin MyCSV, FALSE, NULL,NULL,NULL

GO

EXEC sp_tables_ex MyCSV    -- note how it picks up any / all txt and csv files in that folder

GO

select * from MyCSV...Order_Worksheet#csv    -- four part identifier using table name from above

GO

select * from openquery(MyCSV,'select * from order_worksheet.csv')

GO

sp_dropserver 'MyCSV', 'droplogins';

GO
 
 
 
 

-- or use the 'dynamic' linked server by using the provider for the CSV file with the open rowset - it does a good job in translating a CSV, BUT you must have column headings..., clean format etc...
 

SELECT * FROM OPENROWSET('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\','SELECT * FROM order_worksheet.csv')

GO
 
 
 
 
 
 

-- now if your CSV file is "nice and clean with column headers" the above will work fine. 

-- if not, then you have a few choices - you could use a format file to enforce those column headers

-- you have a choice of format files - either the traditional column definitions, or xml.
 
 
 
 
 
 

-- or really is best to use some kind of temp work area such as a staging table...

-- now try a bulk insert of the entire row into a staging area ready for manual unpacking

-- first kill any pre-existing staging tables

if exists (select null from information_schema.tables where table_name = 'tmp_staging_table') drop table tmp_staging_table

go

-- create afresh

CREATE TABLE tmp_staging_table (line varchar(max))

go
 

-- load it up

BULK insert tmp_staging_table from 'c:\order_worksheet.csv' with (fieldterminator='\r\n', firstrow=2)

go
 

-- have a look

select * from tmp_staging_table

go
 

-- then run through a procedure to unpack it - not included just yet - there is one down below...
 
 
 
 
 

-- or try a openrowset of the entire row without having to go into a staging area first, but ready for manual unpacking into a staging table

-- first create a format file depicting a single line... (the three lines between the asterixs)

/*

8.0

1

1       SQLCHAR       0      8000     "\r\n"  6     line             ""

*/

-- other legitimate field terminators are "," for a comma, ",\"" for a field preceeding a quote encapsulate field - ie kill the quote of the next column, and "\"," for the following field, "\t" for tab delimited etc etc...
 

-- now select from the csv file...

select * FROM OPENROWSET(BULK 'c:\order_worksheet.csv' , FORMATFILE = 'c:\single_line.fmt',FIRSTROW = 1, MAXERRORS=10) AS A

go
 

-- notice we can use where clauses and suchlike to select from the csv file...

select * FROM OPENROWSET(BULK 'c:\order_worksheet.csv' , FORMATFILE = 'c:\single_line.fmt',FIRSTROW = 1, MAXERRORS=10) AS A 

where a.line like 'Product%'
 

-- so here we gain an immediate benefit by being able to select which rows are important...
 

-- now let's create an unpack type function. The first one is field by field - slow, but highly flexible, and not suited to lots of columns...

create function [dbo].[uStr](@colnum int, @incoming varchar(max))

returns varchar(1000)

as

Begin
 

   declare @int int

   declare @str varchar(1000)
 

   set @int = 0

   set @incoming = @incoming + ',  '

   while len(@incoming) > 1 and @int < 2000 --failsafe of 2000 columns

   begin
 

      set @int = @int + 1

      if substring(@incoming,1,1) = '"'

         begin

            set @str = substring(@incoming,2,charindex('",',substring(@incoming,2,len(@incoming)))-1)

            set @incoming = substring(@incoming,charindex('",',@incoming)+2,len(@incoming))

         end

      else  

         begin 

            set @str = substring(@incoming,1,charindex(',',substring(@incoming,1,len(@incoming)))-1)

            set @incoming = substring(@incoming,charindex(',',@incoming)+1,len(@incoming))

         end
 

      if @int = @colnum return replace(@str,'""','"')

   end 
 

return ''

end

go
 

-- let us now use that function, by calling it dynamically field by field....
 

select dbo.ustr(1,line),dbo.ustr(2,line),dbo.ustr(3,line),dbo.ustr(4,line),dbo.ustr(5,line),

       dbo.ustr(6,line),dbo.ustr(7,line),dbo.ustr(8,line),dbo.ustr(9,line),dbo.ustr(10,line)

FROM OPENROWSET(BULK 'c:\order_worksheet.csv' , FORMATFILE = 'c:\single_line.fmt',FIRSTROW = 1, MAXERRORS=10) AS A 

where a.line like 'Product%'
 

-- now let us take that a step further by manipulating those columns a bit more - all directly from the input - there is no database commitment yet...
 

select dbo.ustr(1,line),dbo.ustr(2,line),dbo.ustr(3,line),convert(datetime,dbo.ustr(4,line),103),dbo.ustr(5,line),

       dbo.ustr(6,line),dbo.ustr(7,line),convert(money,replace(dbo.ustr(8,line),',','')),convert(money,replace(dbo.ustr(9,line),',','')),convert(money,replace(dbo.ustr(10,line),',',''))

FROM OPENROWSET(BULK 'c:\order_worksheet.csv' , FORMATFILE = 'c:\single_line.fmt',FIRSTROW = 1, MAXERRORS=10) AS A 

where a.line like 'Product%'
 

-- we can use the above to load directly, or into a staging table...
 
 
 
 
 

-- now, for big csv's, or lots of columns where performance is at a premium... would need to load into a staging table as per the bulk insert example above, or use the select line from openrowset example above

-- and then, open a cursor on that table recursing each line...

-- for this example just as a script (and could easily become a SP), will not update the table, just form the columns and display...
 

declare @colstr varchar(1000)

declare @colnum int

declare @incoming varchar(max)

declare @id int,

        @item Varchar(100),

        @description Varchar(100),

        @DeliverTo Varchar (500),

        @DueDate datetime,

        @Price Varchar(50),

        @Qty Varchar(50),

        @Discount Varchar(50),

        @Net money,

        @Tax money,

        @Total money
 

declare csr cursor fast_forward read_only for select line FROM OPENROWSET(BULK 'c:\order_worksheet.csv' , FORMATFILE = 'c:\single_line.fmt',FIRSTROW = 1, MAXERRORS=10) AS A 

                                                          where a.line like 'Product%'

--or, select line from tmp_staging_table
 

open csr

fetch next from csr into @incoming

while @@fetch_status = 0

begin

   set @colnum = 0

   set @incoming = @incoming + ',  '

   while len(@incoming) > 1 and @colnum < 21

   begin
 

      set @colnum = @colnum + 1

      if substring(@incoming,1,1) = '"'

         begin

            set @colstr = replace(substring(@incoming,2,charindex('",',substring(@incoming,2,len(@incoming)))-1),'""','"')

            set @incoming = substring(@incoming,charindex('",',@incoming)+2,len(@incoming))

         end

      else  

         begin 

            set @colstr = substring(@incoming,1,charindex(',',substring(@incoming,1,len(@incoming)))-1)

            set @incoming = substring(@incoming,charindex(',',@incoming)+1,len(@incoming))

         end
 

      if @colnum = 01 set @item = @colstr

      if @colnum = 02 set @description = @colstr

      if @colnum = 03 set @deliverto = @colstr

      if @colnum = 04 set @duedate = convert(datetime,@colstr,103)

      if @colnum = 05 set @price = @colstr

      if @colnum = 06 set @qty = @colstr

      if @colnum = 07 set @discount = @colstr

      if @colnum = 08 set @net = replace(@colstr,',','')

      if @colnum = 09 set @tax = replace(@colstr,',','')

      if @colnum = 10 set @total = replace(@colstr,',','')

   end
 

-- do the insert or whatever here... for now, just display

   select @item,@description,@DeliverTo,@DueDate,@Price,@Qty,@Discount,@Net,@Tax,@Total
 

   fetch next from csr into @incoming

end
 

close csr

deallocate csr
 

GO
 

-- and there you have it - a few different ways to manually gain complete control over the CSV

-- and there are still the methods to read a "clean" csv file as well. But, if you code for the worse case, then best case is automatically catered for.

Open in new window

0
 

Author Comment

by:itsmevic
ID: 24087739
So what your saying is that I would need to open each file individually in order for this to work?  I had planned to do this originally as I will just be needing to review a few logs a day, so this will not be a big deal having to do this.  If I had hundreds of these logs to review daily then I could see an issue with it.

I looked at that link you sent me but am not sure how to correlate that code into what I'm exactly needing in my initial description of what I'm needing.  This is what I typically do:

1.)  Open Logparser 2.2
2.)  Enter in my Logparser command for my parse

   ex:    logparser -i:csv -o:datagrid "select * from \\myserver\logs\myfile.csv where.... and my statement continues from there.   I need to be able to read through my .csv file so that it can pick up any IP variations on user accounts.  

    ex.   lets say my authentication log picks up that JDOE logged in 7 times on 4/5/2009, however out of those 7 logins he shows to be logging in from another IP address, so i have JDOE logging in six times with the same IP of 111.11.111.111 and one logon from IP 222.22.222.222  that is not the same, I need for the statement to be able to pick that up.

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24088024
post a sample csv file if you want, along with the versions of SQL you are running
0
 

Author Comment

by:itsmevic
ID: 24088403
Hi Mark,

    I'm running Logparser it runs like a SQL engine but isn't SQL.  It uses SQL commands in order to parse through logs.  I've posted a sample .csv file that I already have a script for.  basically i'll run this script against the log, it copies, moves and converts the log to a .CSV, what you see below is the end result of that CSV file.  From this CSV file I run logparser against it to parse through it's data.  my problem is that I do not know the correct sql syntax that will allow me to parse through the log and detect IP variations in the authentication log.  I've attached an example hopefully this will help.
Example.JPG
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24088505
OK, thought it may have been more a case of using logparser to generate the CSV and then you independantly wanted to process the CSV.

So, if we forget datasource for a moment and look at your SQL statement :


"SELECT * FROM <datasource> WHERE EVTID = '540' AND DOMAIN = 'TESTDC' AND LOGONTYPE = '3'"

the only comment is make sure the datatypes match, ie migh need to drop the single quotes if the logontype and evtid are considered numeric.

As far as testing anomolies for a user and their IP address, then it should be a simple matter of adding that criteria :

"SELECT * FROM <datasource> WHERE EVTID = '540' AND DOMAIN = 'TESTDC' AND LOGONTYPE = '3' AND USERNAME = 'JOE' AND IP <> '111.11.111.111'"

Or am I totally missing the question ?

0
 

Author Comment

by:itsmevic
ID: 24089342
Hi Mark,

    Your close with the SQL statement below and presently I'm doing that right now, but this only pulls one individual user from the log file where as we are talking a log file that could be 800kb to 1mb in size, so you can imagine how much text data that is.  I've got basically hundreds if not thousands of different users and using the method below would probably take about 2 weeks to go through and audit one log.  Basically want to run the SQL statement against the ENTIRE log and for it to produce and capture it's findings that way.  Hopefully that will help explain it a little better.  On a side note, my logs are basically generally unformated at the beginning, I run a script that converts them to a .CSV with headers of my choosing w/and end result of the screen shot I attached above.  From there, I open Logparser and run my statement through it so that it parses and produces my output in a datagrid.

Logparser.exe -i:CSV -o:DATAGRID "SELECT * FROM <datasource> WHERE EVTID = '540' AND DOMAIN = 'TESTDC' AND LOGONTYPE = '3' AND USERNAME = 'JOE' AND IP <> '111.11.111.111'"

I've also attached the over-all flow of what I'm doing to give you a broader picture.
flow.JPG
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24089655
Basically you are going to need to join or lookup the user "master" details instead of having to hard code any literals... Is there any way in your script to extract the user expected details ? then you could use something like :

AND EXPECTED_IP <> IP

rather than :

AND USERNAME = 'JOE' AND IP <> '111.11.111.111'
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:itsmevic
ID: 24090003
On a static type of network that might fly but most days everyone is DHCP which I'm assume would be a problem with your suggestion above, especially with user that's work remotely as well as come into the office ) :
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24093412
Ummm, yes that would be a problem.

So, how do you / would you tell if there is something different from "expected" ?
0
 

Author Comment

by:itsmevic
ID: 24097738
I see what your saying, to see what is "expected" you would take the majority of IP's for that user listed and look for a variance off of that majority listing.  For example let's say JDOE has logon entries for 4/7/2009, 9 out of 10 entries IP's read 222.222.22.222, however 1 out of 10 IP's read 232.342.99.01.  The script would then flag that variation in it's read out.
0
 

Author Comment

by:itsmevic
ID: 24098289
Quite a complex SQL statement I would assume, I wonder if it's even doable?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24098472
Are you sure you dont want to load into SQL and run the queries there ? It is going to be very difficult otherwise...
0
 

Author Comment

by:itsmevic
ID: 24100731
you mean like running the free SQL Express version on my machine or are you talking about something else.  As for as it being difficult how would running the queries in a SQL client be easier than lets saying running them in Logparser which has a lite SQL engine built in?
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
ID: 24101727
Yep, express would do it. Beacuse you then have a full table that you can do things with and create procedures around and do procedural coding.

The lite sql engine built in is a bit limited when needing to do procedural type work I think. For example, from that query find the IP address by user with the highest incidence of use.
0
 

Author Comment

by:itsmevic
ID: 24110253
ok thanks.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I met Paul Devereux (@pdevereux) today when I responded to his tweet asking “Anybody know how to automate adding files from disk to a folder in #outlook  ?”.  I replied back and told Paul that using automation, in this case scripting, to add files t…
This article is the result of a quest to better understand Task Scheduler 2.0 and all the newer objects available in vbscript in this version over  the limited options we had scripting in Task Scheduler 1.0.  As I started my journey of knowledge I f…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now