Solved

BULK INSERT comma delimited file -- but the file has commas!

Posted on 2006-11-29
16
4,361 Views
Last Modified: 2012-06-21
I'm hoping the solution to this is something really easy that I'm just overlooking.  Currently I'm using DTS to load data from a comma delimited CSV file into temporary tables which are then combined with my active application tables.  I'm trying to put together a system which minimizes the manual part of this data transfer and gets away from DTS.  I will still have to manually pull data out of one system and get it into CSV files for the import, but I'm hoping that at that point I can let my application take over.

What I've done so far is have my application create the temp tables using CREATE TABLE, then take the CSV files which get uploaded to the server and do a BULK INSERT on them.  The data all goes in without any errors (now that I've solved my premature EOF error) but the problem is, the files are all comma delimited and several of the fields contain commas.  Client names can be multiple names separated by a comma (e.g. Smith, Smith, and Jones, LLC) and also there are several numeric fields which contain commas (e.g. 1,100 sq ft).  Of course, what happens on these rows is that the data gets spread out across more columns than it should and throws off the order of things.

So, my question is: is there a way to have the BULK INSERT ignore commas that are contained in data fields but still use the commas which delimit those data fields?

Thanks very much!
0
Comment
Question by:saoirse1916
  • 9
  • 6
16 Comments
 
LVL 8

Author Comment

by:saoirse1916
ID: 18037564
Sorry -- meant to add that I'm using MS SQL Server 2000 and ASP/VBScript for my application, if it matters.
0
 
LVL 1

Accepted Solution

by:
jsimbeck earned 500 total points
ID: 18037628
It might be worth considering to make your files tab-delimited instead of comma-delimited (if possible).  While its still possible to embed a tab inside a field, its less likely to occur than an embedded comma.  Another option is to make your data quoted and comma-delimited, but you may find embedded quotes as well, so tabs are still safer.
0
 
LVL 44

Expert Comment

by:Arthur_Wood
ID: 18037976
"is there a way to have the BULK INSERT ignore commas that are contained in data fields but still use the commas which delimit those data fields?"

What is there to distinguish one kind of Comma (those embedded in what you interpret to be a single field (such as a company named Dewey, Cheatham and Howe, Inc) for another kind of comma, used to delimit fields.  After all a comma (,) is still a comma (,).

jsimbeck is on the right path.  Change the way the fields in your source files are dlimited.

AW
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038021
I'll certainly look into a tab delimited file.  One question though -- I've been using comma delimited files through DTS for about a year with no problems at all.  How is it that DTS can distinguish between a comma in a field and one that delimits a field?
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038110
There are a couple of possibilities that I can see:

1.  You've been lucky up to this point and this is the first time data has had embedded comments.
2.  Something changed recently that might have modified a text identifier (usually quotes surrounding each field of textual data)?

As far as SQL server is concerned, unless its surrounded by a text identifier, its like AW said:  "...a comma (,) is still a comma (,)."
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038120
* correction...  meant to say "commas", not "comments" in "1." above.
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038200
I looked back over my old data feeds and many of the same fields are there -- i.e. those with commas have gone in just fine before when using DTS.

I tried using a tab delimited set of files and now I'm getting some other errors -- most notably, all of my numeric data (which includes commas most of the time) has now gone into the database surrounded with double-quotes.
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038233
I just noticed something -- in my comma delimited file, every data field that contains a comma in the field itself is surrounded by quotes.  I'm assuming that's how DTS was able to allow those quotes to go into the DB without treating them as delimiters.  Is there a way that I can use this technique using BULK INSERT?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038238
Has anything changed from when things used to work to now, including upgrades or patches to SQL?
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038327
No, not at all -- I've only been working on this for the past 2 days, and nothing's been done to the server.
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18038332
Bulk Insert may be a bit more persnickety since it allows for a field terminator, but not necessarily a field qualifier.  From the SQL Server Books Online:

FIELDTERMINATOR [ = 'field_terminator' ]
Specifies the field terminator to be used for char and widechar data files. The default is \t (tab character).

So if you are amenable to using a tab for your field delimiter, you should be able to use Bulk Insert using \t for your tab delimiter.
That does leave unresolved the problem of the commas in your numeric fields.  Also, without trying it, I'm not sure if Bulk Insert would import the surrounding quotes or not (the Bulk Inserts I've been doing here have been on fixed-length records).
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18038363
Yeah, that's the problem that I've been running into now with a tab-delimited file: BULK INSERT imports the quotes as well.  I also ran into a data conversion error (type mismatch) for a column that I've added which is an identity field, but I think I know how to get around that.
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18039872
Well, the tab delimited file does get the data loaded but it inserts the quotes as well.  I can add to my routine a query that finds any records with quotes and removes them from all fields.  I'm still curious though, why there's no way to use a comma delimited file as I have been doing for so long now with DTS but using BULK INSERT instead?
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18045646
I tried to come up with a routine that would find all records with quotes in the fields, but since I'm using ASP/VBScript, I can't create the recordset.  The query works fine in query analyzer, but doesn't in ASP because of the quotation marks:

SELECT * FROM bldg WHERE bldgname LIKE '%"%' -- works fine in QA but as an ASP query, it fails

strSQL = "SELECT * from bldg WHERE bldgname LIKE '%"%'" -- the quote between the % signs tells the ASP compiler that the string is finished but then leaves the remaining %'" hanging out there and causes an error.  Anyone have any ideas for how I can get a recordset that will identify the quoted records in my table?
0
 
LVL 1

Expert Comment

by:jsimbeck
ID: 18045828
Try something like:

SELECT * FROM bldg WHERE PATINDEX('%' + CHAR(34) + '%', bldgname) > 0

or you can use doubled quotes to indicate that the quote character is part of the SQL query:

strSQL = "SELECT * from bldg WHERE bldgname LIKE '%""%'"

(note:  there are two double-quote characters between the percent signs)
0
 
LVL 8

Author Comment

by:saoirse1916
ID: 18046065
That did it -- I'll play around with this for a while.  Looks like a tab-delimited file is the way I'll have to go.

Thanks very much for your help!
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

I guess that all of us know that caching the data usually increase the performance, but I worried if all of us are aware about the risk that caching the data provides and how to minimize this.  That’s the reason why I decided to write this short art…
I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now