Link to home
Start Free TrialLog in
Avatar of saoirse1916
saoirse1916Flag for United States of America

asked on

BULK INSERT comma delimited file -- but the file has commas!

I'm hoping the solution to this is something really easy that I'm just overlooking.  Currently I'm using DTS to load data from a comma delimited CSV file into temporary tables which are then combined with my active application tables.  I'm trying to put together a system which minimizes the manual part of this data transfer and gets away from DTS.  I will still have to manually pull data out of one system and get it into CSV files for the import, but I'm hoping that at that point I can let my application take over.

What I've done so far is have my application create the temp tables using CREATE TABLE, then take the CSV files which get uploaded to the server and do a BULK INSERT on them.  The data all goes in without any errors (now that I've solved my premature EOF error) but the problem is, the files are all comma delimited and several of the fields contain commas.  Client names can be multiple names separated by a comma (e.g. Smith, Smith, and Jones, LLC) and also there are several numeric fields which contain commas (e.g. 1,100 sq ft).  Of course, what happens on these rows is that the data gets spread out across more columns than it should and throws off the order of things.

So, my question is: is there a way to have the BULK INSERT ignore commas that are contained in data fields but still use the commas which delimit those data fields?

Thanks very much!
Avatar of saoirse1916
saoirse1916
Flag of United States of America image

ASKER

Sorry -- meant to add that I'm using MS SQL Server 2000 and ASP/VBScript for my application, if it matters.
ASKER CERTIFIED SOLUTION
Avatar of jsimbeck
jsimbeck

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
"is there a way to have the BULK INSERT ignore commas that are contained in data fields but still use the commas which delimit those data fields?"

What is there to distinguish one kind of Comma (those embedded in what you interpret to be a single field (such as a company named Dewey, Cheatham and Howe, Inc) for another kind of comma, used to delimit fields.  After all a comma (,) is still a comma (,).

jsimbeck is on the right path.  Change the way the fields in your source files are dlimited.

AW
I'll certainly look into a tab delimited file.  One question though -- I've been using comma delimited files through DTS for about a year with no problems at all.  How is it that DTS can distinguish between a comma in a field and one that delimits a field?
Avatar of jsimbeck
jsimbeck

There are a couple of possibilities that I can see:

1.  You've been lucky up to this point and this is the first time data has had embedded comments.
2.  Something changed recently that might have modified a text identifier (usually quotes surrounding each field of textual data)?

As far as SQL server is concerned, unless its surrounded by a text identifier, its like AW said:  "...a comma (,) is still a comma (,)."
* correction...  meant to say "commas", not "comments" in "1." above.
I looked back over my old data feeds and many of the same fields are there -- i.e. those with commas have gone in just fine before when using DTS.

I tried using a tab delimited set of files and now I'm getting some other errors -- most notably, all of my numeric data (which includes commas most of the time) has now gone into the database surrounded with double-quotes.
I just noticed something -- in my comma delimited file, every data field that contains a comma in the field itself is surrounded by quotes.  I'm assuming that's how DTS was able to allow those quotes to go into the DB without treating them as delimiters.  Is there a way that I can use this technique using BULK INSERT?
Has anything changed from when things used to work to now, including upgrades or patches to SQL?
No, not at all -- I've only been working on this for the past 2 days, and nothing's been done to the server.
Bulk Insert may be a bit more persnickety since it allows for a field terminator, but not necessarily a field qualifier.  From the SQL Server Books Online:

FIELDTERMINATOR [ = 'field_terminator' ]
Specifies the field terminator to be used for char and widechar data files. The default is \t (tab character).

So if you are amenable to using a tab for your field delimiter, you should be able to use Bulk Insert using \t for your tab delimiter.
That does leave unresolved the problem of the commas in your numeric fields.  Also, without trying it, I'm not sure if Bulk Insert would import the surrounding quotes or not (the Bulk Inserts I've been doing here have been on fixed-length records).
Yeah, that's the problem that I've been running into now with a tab-delimited file: BULK INSERT imports the quotes as well.  I also ran into a data conversion error (type mismatch) for a column that I've added which is an identity field, but I think I know how to get around that.
Well, the tab delimited file does get the data loaded but it inserts the quotes as well.  I can add to my routine a query that finds any records with quotes and removes them from all fields.  I'm still curious though, why there's no way to use a comma delimited file as I have been doing for so long now with DTS but using BULK INSERT instead?
I tried to come up with a routine that would find all records with quotes in the fields, but since I'm using ASP/VBScript, I can't create the recordset.  The query works fine in query analyzer, but doesn't in ASP because of the quotation marks:

SELECT * FROM bldg WHERE bldgname LIKE '%"%' -- works fine in QA but as an ASP query, it fails

strSQL = "SELECT * from bldg WHERE bldgname LIKE '%"%'" -- the quote between the % signs tells the ASP compiler that the string is finished but then leaves the remaining %'" hanging out there and causes an error.  Anyone have any ideas for how I can get a recordset that will identify the quoted records in my table?
Try something like:

SELECT * FROM bldg WHERE PATINDEX('%' + CHAR(34) + '%', bldgname) > 0

or you can use doubled quotes to indicate that the quote character is part of the SQL query:

strSQL = "SELECT * from bldg WHERE bldgname LIKE '%""%'"

(note:  there are two double-quote characters between the percent signs)
That did it -- I'll play around with this for a while.  Looks like a tab-delimited file is the way I'll have to go.

Thanks very much for your help!