[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1534
  • Last Modified:

import large csv file with many columns into sql server 2008

i have to import a 15g csv file, 7+ million rows, and 1650 columns.  the number of columns is the problem.  here is what i've tried:

1) import and export wizard.  the wizard can't handle that many columns.  deleting excess columns defined in the wizard causes the import to fail - conversion error is always generated on the last column that is not deleted.

2) bulk import. I've tried bulk import, specifying the columns, but this generates an error on the last column in each row so doesn't import any records.  error message:
Msg 4863, Level 16, State 1, Line 2
Bulk load data conversion error (truncation) for row 1, column nn (VARNAME), where column nn is the last column specified and VARNAME is the column name.  doesn't seem to matter which column I stop at.

3) BCP command line  utility.  (ipcsv in powershell).  i can't seem to get the syntax correct for this.  i'm trying: ipcsv -path c:\file.csv.  error message: the member "1" is already present.

4) i wrote a vb program to read the file and write it out as two separate files so that each one is under SQL Server's maximum column limit of 1024.  This seems to be working, but it takes a very long time to read and write the files.  I'm using the TextFieldParser and WriteAllText objects/methods.  These seems to be the best because I suspect the file has a few extra commas here and there, but these methods are much slower than Line Input.
0
RBECKMN
Asked:
RBECKMN
  • 5
  • 3
1 Solution
 
GSQLCommented:
I've never tried a file so large but have had good success with the bcp utility in the past.  The problem seems like a problem with the row terminator.

Check this for bcp utility syntax.  
http://msdn.microsoft.com/en-us/library/ms162802.aspx

Pay close attention to the -r parameter (row_term), which typically is \n for newline, but could be different depending on the file format.
0
 
RBECKMNAuthor Commented:
here is the command line I used:

bcp testdb.dbo.testdata in c:\testdata.csv -r \r\n -t \, -S bigdata\bigsqlserver -T -c

(testdb is the sql database name, testdata is the table name; the sql 2008 instance is named bigsqlserver, running on a windows 2008 server named bigdata.)

the results, after about 2 1/2 minutes:

SQLState = S1000, Nativeerror = 0
Error = [Microsoft][SQL Server Native Client 10.0] Unexpected EOF Encountered in BCP data-file
0 rows copied


the testdata table is defined as consisting of 800 columns of varchar(50).  the file (c:\testdata.csv) has 1594 columns, is comma-delimited and the row delimiter is CR/LF.

0
 
GSQLCommented:
I wonder if bcp is encountering an unexpected character in just one of the rows.  Have you tried the bcp with just the first few rows?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
RBECKMNAuthor Commented:
i tried the -L parameter with 5, 500 and 5000.  same result each time - unexpected EOF
0
 
RBECKMNAuthor Commented:
I think the problem with BCP is that the import process takes everything after the 800th column as being part of the 800th column, and of course that's a lot of text so it gets truncated thereby generating an error.

This is what happens with the import/export wizard - it always chokes on the last column.  there doesn't seem to be anyway to tell SQL to not read the entire line.
0
 
GSQLCommented:
According to this information

http://msdn.microsoft.com/en-us/library/ms188609.aspx

a csv file is not supported, but can be used if certain restrictions are met.  

Though bcp may not be the ideal solution in your case, this line from the above article intrigues me:

"The structure of the data file does not need to be identical to the structure of the SQL Server table because columns can be skipped or reordered during the bulk-import process."

I wonder if you could parse it into 800 column sections?

0
 
RBECKMNAuthor Commented:
Eureka .. the import/export wizard does have a way of "ignoring" columns in the file.  I haven't figured out how to do it except by manual marking each column as ignored.  That's tedious since the file has 1600 columns, so it needs to be imported into two separate tables.

My VB program has finished running now, and it did break the file into two separate csv files of 800 columns each, so that's another option.  The VB program is actually faster than the import wizard.  I'm sure code could be written to import the data directly without breaking up the file, but I'm also sure that would be much slower than anything else.
0
 
RBECKMNAuthor Commented:
Thanks for the help.  the key is being able to skip columns.  haven't figured out how to do this with bcp, but although tedious it works with the import wizard.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now