Link to home
Start Free TrialLog in
Avatar of VicZ
VicZFlag for United States of America

asked on

Import of CSV file into SQL using OPENROWSET

I am using the following to import CSV files into a SQL table...

select * from OpenRowset('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\CSVDIR\; ','select * from sample.csv')

These CSV files are generated by a program and some are created but empty.

My problem:
The query above fails when I hit an empty CSV file, either with or without column.

I also want to count the number of records

select count(col1) from openrowset...

Can the geniuses help me with a solution?  I prefer some error handling.
Avatar of EugeneZ
EugeneZ
Flag of United States of America image

can you get data from csv file with data and columns?
Avatar of VicZ

ASKER

Use excel, load it with data of different values (number, dates, char, etc.) save it as a text delimeted CSV and use my query above to pull it into SQL. If the table (CSV) has ZERO data it will give an error...

OLE DB provider "MSDASQL" for linked server "(null)" returned message "[Microsoft][ODBC Text Driver] Query must have at least one destination field.".

I need a work around, even some error handling:

if table = nul OR count(col1) = 0
then next file, ignore (whatever)

I AM RAISNIG THE POINTS ON THIS ONE
ASKER CERTIFIED SOLUTION
Avatar of Mark Wills
Mark Wills
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of VicZ

ASKER

Mark, first option looks good but I have applied something similar, the only problem, is that I am trying to stay away from #temp tables and linked servers. I am processing thousands of CSV files everyday, #temp tables and linked servers looped in the thousands bogs my system down.

I used variable tables on a similar solution using DBF (Fox) files and it works fine because the variables reside in memory and drop after every loop. The CSV driver for SQL is failing me when there is an empty value (table, row or column), I run a check script that points out the empties, but it takes me time to fix these everyday.

I am looking at your second option, I like the command shell, let me get back to you.
Oh, once the linked server is created you dont have to create it every time, it will be able to access all the csv's, so you could loop through each csv and put each one into the temp table (just add table name). But know what you mean about temp tables. Let me know if you want some looping type logic in there, just did the one file specifically as an exple for the openrowset select.

Avatar of VicZ

ASKER

Option 2 is working well with the file size.

What other attributes can you throw there?

ADDING 50 MORE POINTS
Avatar of VicZ

ASKER

Option 3 is also good to catch a yey or ney. Good work!

I want to give you the points. Just throw some more attributes for that command shell option,

I have a loop. If I need other help on that I will open another question.
What other attributes do you want ?

at the DOS prompt do a help for

You will notice that I use a "" to delimit the different attibutes. adding   %~ti  for date/time is a popular one as well...


    %~I         - expands %I removing any surrounding quotes (")
    %~fI        - expands %I to a fully qualified path name
    %~dI        - expands %I to a drive letter only
    %~pI        - expands %I to a path only
    %~nI        - expands %I to a file name only
    %~xI        - expands %I to a file extension only
    %~sI        - expanded path contains short names only
    %~aI        - expands %I to file attributes of file
    %~tI        - expands %I to date/time of file
    %~zI        - expands %I to size of file
    %~$PATH:I   - searches the directories listed in the PATH
                   environment variable and expands %I to the
                   fully qualified name of the first one found.
                   If the environment variable name is not
                   defined or the file is not found by the
                   search, then this modifier expands to the
                   empty string

The modifiers can be combined to get compound results:

    %~dpI       - expands %I to a drive letter and path only
    %~nxI       - expands %I to a file name and extension only
    %~fsI       - expands %I to a full path name with short names only
    %~dp$PATH:I - searches the directories listed in the PATH
                   environment variable for %I and expands to the
                   drive letter and path of the first one found.
    %~ftzaI     - expands %I to a DIR like output line

Open in new window

Avatar of VicZ

ASKER

AWESOME. You earned it mark. Thank you fro your help.

(Now I just have to figure out how to list subdirectories on that cmdshell script)
Avatar of VicZ

ASKER

All three options work great, they just have different levels of approach.
Avatar of VicZ

ASKER

Hey Mark I am having a problem using option 3 with an IF condition, can you help me with an example?

Well, the try kinda replaces the IF, so not sure what you want to do with the if - can you describe it a bit more ?
Avatar of VicZ

ASKER

I want to cycle thru all my CSV files and list the name with row number total or zeros...

set @count = 0
set @filename char
while count < than 100
begin try
   exec('select @filename as filename, count(col1) as totalrows from OpenRowset(''MSDASQL'', ''Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\EE\; '',''select * from any.csv'')')
end try
begin catch
   select @filename + ' = No rows'
end catch

set @count = @count +1
end


...or something like that, I need to display filename, number of rows or nulls (empty csv, not even with columns)


OK, so how about combining parts of methods 2 and 3 (and note how it is now recursing subdirectores with the /R qualifier)


declare @tmp table (id int identity, results varchar(255),filename varchar(255),filesize varchar(100), processed datetime)
insert @tmp (results)
exec master..xp_cmdshell 'for /R "c:\ee" %i in (*.csv) do @echo %~fi""%~zi""%~ti'

update @tmp set filename = substring(results,1,charindex('""',results)-1), filesize = substring(results,charindex('""',results)+2,10)

declare @filename varchar(255)
set @filename = (select top 1 filename from @tmp where processed is NULL  order by id)

while isnull(@filename ,'') > ' .'
begin
   update @tmp set processed = getdate() where filename = @filename
   begin try
      exec('select '''+@filename+''' as filename,count(*) as rowcounter from OpenRowset(''MSDASQL'', ''Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\ee; '',''select * from '+@filename+''')')
   end try
   begin catch
      select @filename as filename, 0 as rowcounter
   end catch
   set @filename = isnull((select top 1 filename from @tmp where processed is NULL order by id),NULL)
end
GO


Sorry about that, was playing with the filedate so the above is not quite right...

Also, thought it might be best to capture the reasons why something failed the "try" block, so... have a look below :

(and please ignore the code highlighting - the colours dont always make sense for SQL)...



declare @tmp table (id int identity, results varchar(255),filename varchar(255),filesize int, processed datetime)
insert @tmp (results)
exec master..xp_cmdshell 'for /R "c:\ee" %i in (*.csv) do @echo %~fi""%~zi'

update @tmp set filename = substring(results,1,charindex('""',results)-1), filesize = substring(results,charindex('""',results)+2,10)

declare @filename varchar(255)
declare @tmp_cnt table (id int identity, filename varchar(255), rowcounter bigint, errors nvarchar(2048))

set @filename = (select top 1 filename from @tmp where processed is NULL and filesize > 0 order by id)

while isnull(@filename ,'') > ' .'
begin
   update @tmp set processed = getdate() where filename = @filename
   begin try
      insert @tmp_cnt (filename , rowcounter)
      exec('select '''+@filename+''' as filename,count(*) as rowcounter from OpenRowset(''MSDASQL'', ''Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\ee; '',''select * from '+@filename+''')') 
   end try
   begin catch
      insert @tmp_cnt (filename, errors) values (@filename, error_message())
   end catch
   set @filename = isnull((select top 1 filename from @tmp where processed is NULL and filesize > 0 order by id),NULL)
end

select t.filename, t.filesize, t.processed, c.rowcounter, isnull(c.errors,'') as errors
from @tmp t
left outer join @tmp_cnt c on t.filename = c.filename

go

Open in new window

Avatar of VicZ

ASKER

Very cool solution, I'd like to throw you some more points, but I do not know how no that the solution is closed.

Anyway, believe or not, I am putting all this code in another loop to cycle thru different directories that I already captured in a table. so the "c:\ee" becomes the main directory and other subdirectories have the csv's.

Thank you Mark.
If you fell so compelled, you could use the "Request Attention" button and a Moderator can help you out.

But, there is really no need, enjoyed helping you. Your thanks and knowing it works for you is reward enough :)
Avatar of VicZ

ASKER

Mark, I openned another similar thread, I would like to see if you to help me more.

https://www.experts-exchange.com/questions/25200529/Import-of-CSV-file-into-SQL-2005-with-filter-option.html?fromWizard=true