Solved

BULK INSERT comma delimited file -- but the file has commas!

Posted on 2006-11-29
16
4,388 Views
Last Modified: 2012-06-21
I'm hoping the solution to this is something really easy that I'm just overlooking.  Currently I'm using DTS to load data from a comma delimited CSV file into temporary tables which are then combined with my active application tables.  I'm trying to put together a system which minimizes the manual part of this data transfer and gets away from DTS.  I will still have to manually pull data out of one system and get it into CSV files for the import, but I'm hoping that at that point I can let my application take over.

What I've done so far is have my application create the temp tables using CREATE TABLE, then take the CSV files which get uploaded to the server and do a BULK INSERT on them.  The data all goes in without any errors (now that I've solved my premature EOF error) but the problem is, the files are all comma delimited and several of the fields contain commas.  Client names can be multiple names separated by a comma (e.g. Smith, Smith, and Jones, LLC) and also there are several numeric fields which contain commas (e.g. 1,100 sq ft).  Of course, what happens on these rows is that the data gets spread out across more columns than it should and throws off the order of things.

So, my question is: is there a way to have the BULK INSERT ignore commas that are contained in data fields but still use the commas which delimit those data fields?

Thanks very much!
0
Comment
Question by:saoirse1916
  • 9
  • 6
16 Comments
 
LVL 8

Author Comment

by:saoirse1916
ID: 18037564
Sorry -- meant to add that I'm using MS SQL Server 2000 and ASP/VBScript for my application, if it matters.
0
 
LVL 1

Accepted Solution

by:
jsimbeck earned 500 total points
ID: 18037628
It might be worth considering to make your files tab-delimited instead of comma-delimited (if possible).  While its still possible to embed a tab inside a field, its less likely to occur than an embedded comma.  Another option is to make your data quoted and comma-delimited, but you may find embedded quotes as well, so tabs are still safer.
0
 
LVL 44

Expert Comment

by:Arthur_Wood
ID: 18037976
"is there a way to have the BULK INSERT ignore commas that are contained in data fields but still use the commas which delimit those data fields?"

What is there to distinguish one kind of Comma (those embedded in what you interpret to be a single field (such as a company named Dewey, Cheatham and Howe, Inc) for another kind of comma, used to delimit fields.  After all a comma (,) is still a comma (,).

jsimbeck is on the right path.  Change the way the fields in your source files are dlimited.

AW
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 8

Author Comment

by:saoirse1916
ID: 18038021
I'll certainly look into a tab delimited file.  One question though -- I've been using comma delimited files through DTS for about a year with no problems at all.  How is it that DTS can distinguish between a comma in a field and one that delimits a field?
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038110
There are a couple of possibilities that I can see:

1.  You've been lucky up to this point and this is the first time data has had embedded comments.
2.  Something changed recently that might have modified a text identifier (usually quotes surrounding each field of textual data)?

As far as SQL server is concerned, unless its surrounded by a text identifier, its like AW said:  "...a comma (,) is still a comma (,)."
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038120
* correction...  meant to say "commas", not "comments" in "1." above.
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038200
I looked back over my old data feeds and many of the same fields are there -- i.e. those with commas have gone in just fine before when using DTS.

I tried using a tab delimited set of files and now I'm getting some other errors -- most notably, all of my numeric data (which includes commas most of the time) has now gone into the database surrounded with double-quotes.
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038233
I just noticed something -- in my comma delimited file, every data field that contains a comma in the field itself is surrounded by quotes.  I'm assuming that's how DTS was able to allow those quotes to go into the DB without treating them as delimiters.  Is there a way that I can use this technique using BULK INSERT?
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038238
Has anything changed from when things used to work to now, including upgrades or patches to SQL?
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038327
No, not at all -- I've only been working on this for the past 2 days, and nothing's been done to the server.
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038332
Bulk Insert may be a bit more persnickety since it allows for a field terminator, but not necessarily a field qualifier.  From the SQL Server Books Online:

FIELDTERMINATOR [ = 'field_terminator' ]
Specifies the field terminator to be used for char and widechar data files. The default is \t (tab character).

So if you are amenable to using a tab for your field delimiter, you should be able to use Bulk Insert using \t for your tab delimiter.
That does leave unresolved the problem of the commas in your numeric fields.  Also, without trying it, I'm not sure if Bulk Insert would import the surrounding quotes or not (the Bulk Inserts I've been doing here have been on fixed-length records).
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038363
Yeah, that's the problem that I've been running into now with a tab-delimited file: BULK INSERT imports the quotes as well.  I also ran into a data conversion error (type mismatch) for a column that I've added which is an identity field, but I think I know how to get around that.
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18039872
Well, the tab delimited file does get the data loaded but it inserts the quotes as well.  I can add to my routine a query that finds any records with quotes and removes them from all fields.  I'm still curious though, why there's no way to use a comma delimited file as I have been doing for so long now with DTS but using BULK INSERT instead?
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18045646
I tried to come up with a routine that would find all records with quotes in the fields, but since I'm using ASP/VBScript, I can't create the recordset.  The query works fine in query analyzer, but doesn't in ASP because of the quotation marks:

SELECT * FROM bldg WHERE bldgname LIKE '%"%' -- works fine in QA but as an ASP query, it fails

strSQL = "SELECT * from bldg WHERE bldgname LIKE '%"%'" -- the quote between the % signs tells the ASP compiler that the string is finished but then leaves the remaining %'" hanging out there and causes an error.  Anyone have any ideas for how I can get a recordset that will identify the quoted records in my table?
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18045828
Try something like:

SELECT * FROM bldg WHERE PATINDEX('%' + CHAR(34) + '%', bldgname) > 0

or you can use doubled quotes to indicate that the quote character is part of the SQL query:

strSQL = "SELECT * from bldg WHERE bldgname LIKE '%""%'"

(note:  there are two double-quote characters between the percent signs)
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18046065
That did it -- I'll play around with this for a while.  Looks like a tab-delimited file is the way I'll have to go.

Thanks very much for your help!
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question