SQL 2005 import script / SSIS package to import csv file and check for existing records

I've only had elementary exposure to building SSIS packages and BCP scripts to automate data import, so any suggestions with this would be greatly appreciated.

I'd like to be able to automate a csv file import into a SQL table (would not contain more than 2K records) but only insert if the record's primary key does not already exist.  If there is an existing record with the same primary key as the record in the csv file, then update the record.  

The need seems simple enough and must be something that is widely executed... I just don't know where to begin.

Who is Participating?
Anthony PerkinsConnect With a Mentor Commented:
>>I think the process should include two steps, <<
Right.  That was what I was suggesting here http:#a33688404, but I am glad you agree with me.  :)

>> for the SQL script, do I need to curse through the temp table records and pass the values in an "if exists then update else insert"<<
It is as simple as:

INSERT YourTableNameGoesHere (Col1, Col2, Col3, ...)
SELECT Col1, Col2, Col3, ...
FROM YourStagingTableGoesHere s
          LEFT YourTableNameGoesHere t on s.YourPrimaryKeyGoesHere = t.YourPrimaryKeyGoesHere
WHERE t.YourPrimaryKeyGoesHere IS NULL

Anthony PerkinsCommented:
Here is what you need to do:
1. Import the whole table into a staging table.
2. Once imported, validate the data and only import the rows that do not exist using a trivial SQL query.
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

this is my SSIS package to load detached email files to database.
it is a hard too explain them all in a few words.

it just gives you an idea what it can be.
c0feeAuthor Commented:
acperkins - for the SQL script, do I need to curse through the temp table records and pass the values in an "if exists then update else insert" statement?  If I'm doing this for 2K records, do I have to worry about performance, table locks, etc.?

rashm and timexist - thanks for the links and the diagram.  It will be a useful reference, but at this stage, I'm not so concerned about how to build a process.  But rather how to parse through the CSV file and determine whether I should execute an UPDATE vs. an INSERT- then incorporate into an automated SSIS package.
timexistConnect With a Mentor Commented:
Hi c0fee,

I think the process should include two steps,

the first step, you load the file in the staging table, in this step, you need to make sure, the file itself is not loaded multiple times,

the second step is processing the records in staging table which only the new records (let's say 200 records of total 3000 records are newly loaded records)will be processed, in these 200 records, based on the business rules, you decide which way they should go. After the process you flag these 200 records as processed.

all these two steps can be put in a stored procedure if you want to, they can be put in an Excute SQL Task .

c0feeAuthor Commented:
Implemented the staging table to get this to work.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.