Link to home
Start Free TrialLog in
Avatar of VDanner
VDanner

asked on

Import CSV file into SQL table

I am developing a package to import a flat CSV file into a SQL table.  It does work, but I am unhappy with the performance of over an hour to import an average of 50,000 records.  I first tried the Data Flow Task to map the columns and define the output colun datatypes/lengths to match the SQL destination columns.  Some output columns are to be ignored and are selected as such in the transformation.  I tried initially with the CSV source file location on my local drive, being imported over to the server.  With the poor performance, I then tried the CSV source file location on the same server as the destination, but performance was still poor.
So I then tried a Bulk Insert Task, which I'm not very familiar with.  This does not seem to work since the source columns are larger than the destinationon and/or I have some source columns that need to be ignored.  

Any ideas how I can improve performance of this import?  
Avatar of dbaSQL
dbaSQL
Flag of United States of America image

I worked w/several very similar projects -- unfortunately, I never got the runtime I needed out of DTS when importing the csv.  I got MUCH better runtimes using bulk insert.

You need only to use a format file.  see BOL:  'Using Format Files'
This will tell you exactly how to workaround the table and file variances.  Let me know if you need an example.
Avatar of VDanner
VDanner

ASKER

Since I am so new to Bulk Insert, I would very much appreciate an example.  
sure, gimme one sec
this is a genericised version of a proc i use to load files.  i actually am loading a LOT more than this, but, i pulled it all out except for one, just to keep things simple:

CREATE PROC dbo.usp_loadFiles
AS
SET NOCOUNT ON

BULK INSERT database.dbo.table
 FROM '\\servername\directoryname\fileA.csv'
WITH(FORMATFILE = 'E:\MSSQL\formatfile.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=2,TABLOCK,ROWS_PER_BATCH = 100000)

SET NOCOUNT OFF
GO

You will see, I have the format file in there,k as well as the field and row terminators, FIRSTROW and ROWS_PER_BATCH.  

FIRSTROW is necessary because the files have headers (column names).  i need to skip that row before starting to load the data

here is a format file:

8.0
6
1      SQLCHAR      0      50      ","      2      field1      SQL_Latin1_General_CP1_CI_AS
2      SQLCHAR      0      1      ","      3      field2      SQL_Latin1_General_CP1_CI_AS
3      SQLCHAR      0      1      ","      4      field3      SQL_Latin1_General_CP1_CI_AS
4      SQLCHAR      0      4      ","      5      field4      SQL_Latin1_General_CP1_CI_AS
5      SQLCHAR      0      4      ","      6      field5      SQL_Latin1_General_CP1_CI_AS
6      SQLCHAR      0      4      "\n"      7      field6      SQL_Latin1_General_CP1_CI_AS

(i've just cut/pasted... hopefully it prints out ok)

do you have BOL, and if so, you really need to read 'using format files'.  it is exactly what you need to know - and it explains the format file in detail

Avatar of VDanner

ASKER

Thank you for this example.  I will try out this morning and if all goes well close this question.  I'm not sure what you mean by BOL?  I am working in Business Intelligence Development Studio (BIDS) and have now opened their Help engine, searching on format files.  Between your example and their narratives this should resolve my problem.
ASKER CERTIFIED SOLUTION
Avatar of dbaSQL
dbaSQL
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of VDanner

ASKER

Thank you.  Regarding BOL (duh...should have guess that).  I am downloading that version now and feel confident this and your example will do the trick.  
Excellent.  Good luck!