• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 831
  • Last Modified:

SQL 2005 import script / SSIS package to import csv file and check for existing records

I've only had elementary exposure to building SSIS packages and BCP scripts to automate data import, so any suggestions with this would be greatly appreciated.

I'd like to be able to automate a csv file import into a SQL table (would not contain more than 2K records) but only insert if the record's primary key does not already exist.  If there is an existing record with the same primary key as the record in the csv file, then update the record.  

The need seems simple enough and must be something that is widely executed... I just don't know where to begin.

Thanks,
Mimi
0
c0fee
Asked:
c0fee
  • 2
  • 2
  • 2
  • +1
2 Solutions
 
Anthony PerkinsCommented:
Here is what you need to do:
1. Import the whole table into a staging table.
2. Once imported, validate the data and only import the rows that do not exist using a trivial SQL query.
0
 
timexistCommented:
this is my SSIS package to load detached email files to database.
it is a hard too explain them all in a few words.

it just gives you an idea what it can be.
ssis.jpg
0
Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

 
c0feeAuthor Commented:
acperkins - for the SQL script, do I need to curse through the temp table records and pass the values in an "if exists then update else insert" statement?  If I'm doing this for 2K records, do I have to worry about performance, table locks, etc.?

rashm and timexist - thanks for the links and the diagram.  It will be a useful reference, but at this stage, I'm not so concerned about how to build a process.  But rather how to parse through the CSV file and determine whether I should execute an UPDATE vs. an INSERT- then incorporate into an automated SSIS package.
0
 
timexistCommented:
Hi c0fee,

I think the process should include two steps,

the first step, you load the file in the staging table, in this step, you need to make sure, the file itself is not loaded multiple times,

the second step is processing the records in staging table which only the new records (let's say 200 records of total 3000 records are newly loaded records)will be processed, in these 200 records, based on the business rules, you decide which way they should go. After the process you flag these 200 records as processed.

all these two steps can be put in a stored procedure if you want to, they can be put in an Excute SQL Task .

0
 
Anthony PerkinsCommented:
>>I think the process should include two steps, <<
Right.  That was what I was suggesting here http:#a33688404, but I am glad you agree with me.  :)

>> for the SQL script, do I need to curse through the temp table records and pass the values in an "if exists then update else insert"<<
It is as simple as:

INSERT YourTableNameGoesHere (Col1, Col2, Col3, ...)
SELECT Col1, Col2, Col3, ...
FROM YourStagingTableGoesHere s
          LEFT YourTableNameGoesHere t on s.YourPrimaryKeyGoesHere = t.YourPrimaryKeyGoesHere
WHERE t.YourPrimaryKeyGoesHere IS NULL

0
 
c0feeAuthor Commented:
Implemented the staging table to get this to work.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 2
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now