import large csv file with many columns into sql server 2008

i have to import a 15g csv file, 7+ million rows, and 1650 columns.  the number of columns is the problem.  here is what i've tried:

1) import and export wizard.  the wizard can't handle that many columns.  deleting excess columns defined in the wizard causes the import to fail - conversion error is always generated on the last column that is not deleted.

2) bulk import. I've tried bulk import, specifying the columns, but this generates an error on the last column in each row so doesn't import any records.  error message:
Msg 4863, Level 16, State 1, Line 2
Bulk load data conversion error (truncation) for row 1, column nn (VARNAME), where column nn is the last column specified and VARNAME is the column name.  doesn't seem to matter which column I stop at.

3) BCP command line  utility.  (ipcsv in powershell).  i can't seem to get the syntax correct for this.  i'm trying: ipcsv -path c:\file.csv.  error message: the member "1" is already present.

4) i wrote a vb program to read the file and write it out as two separate files so that each one is under SQL Server's maximum column limit of 1024.  This seems to be working, but it takes a very long time to read and write the files.  I'm using the TextFieldParser and WriteAllText objects/methods.  These seems to be the best because I suspect the file has a few extra commas here and there, but these methods are much slower than Line Input.
Who is Participating?
GSQLConnect With a Mentor Commented:
According to this information

a csv file is not supported, but can be used if certain restrictions are met.  

Though bcp may not be the ideal solution in your case, this line from the above article intrigues me:

"The structure of the data file does not need to be identical to the structure of the SQL Server table because columns can be skipped or reordered during the bulk-import process."

I wonder if you could parse it into 800 column sections?

I've never tried a file so large but have had good success with the bcp utility in the past.  The problem seems like a problem with the row terminator.

Check this for bcp utility syntax.

Pay close attention to the -r parameter (row_term), which typically is \n for newline, but could be different depending on the file format.
RBECKMNAuthor Commented:
here is the command line I used:

bcp testdb.dbo.testdata in c:\testdata.csv -r \r\n -t \, -S bigdata\bigsqlserver -T -c

(testdb is the sql database name, testdata is the table name; the sql 2008 instance is named bigsqlserver, running on a windows 2008 server named bigdata.)

the results, after about 2 1/2 minutes:

SQLState = S1000, Nativeerror = 0
Error = [Microsoft][SQL Server Native Client 10.0] Unexpected EOF Encountered in BCP data-file
0 rows copied

the testdata table is defined as consisting of 800 columns of varchar(50).  the file (c:\testdata.csv) has 1594 columns, is comma-delimited and the row delimiter is CR/LF.

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

I wonder if bcp is encountering an unexpected character in just one of the rows.  Have you tried the bcp with just the first few rows?
RBECKMNAuthor Commented:
i tried the -L parameter with 5, 500 and 5000.  same result each time - unexpected EOF
RBECKMNAuthor Commented:
I think the problem with BCP is that the import process takes everything after the 800th column as being part of the 800th column, and of course that's a lot of text so it gets truncated thereby generating an error.

This is what happens with the import/export wizard - it always chokes on the last column.  there doesn't seem to be anyway to tell SQL to not read the entire line.
RBECKMNAuthor Commented:
Eureka .. the import/export wizard does have a way of "ignoring" columns in the file.  I haven't figured out how to do it except by manual marking each column as ignored.  That's tedious since the file has 1600 columns, so it needs to be imported into two separate tables.

My VB program has finished running now, and it did break the file into two separate csv files of 800 columns each, so that's another option.  The VB program is actually faster than the import wizard.  I'm sure code could be written to import the data directly without breaking up the file, but I'm also sure that would be much slower than anything else.
RBECKMNAuthor Commented:
Thanks for the help.  the key is being able to skip columns.  haven't figured out how to do this with bcp, but although tedious it works with the import wizard.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.