Solved

Scheduled importing of CSV file using DTS or SSIS

Posted on 2009-03-30
3
1,015 Views
Last Modified: 2013-11-30
I have a business partner with FTP access to our server.  He has his system set to automatically FTP a CSV data file to our server.  I need to read this file periodically and import or update his existing records per the CSV file.  New files will be FTP'd every day for example and will have a unique name of some sort.  I don't think this is a complicated project as the application is very much just a web based (dotnet),  15 field data storage and recall application.

The most complicated part (for me anyway) may be trying to figure out whether I should be updating or adding new records based on the clients CSV.  Client will be sending their PrimaryKey with their records so I will know if their records already exist, will just need to write the code for that (is this where the DTS/SSIS part comes in?)

... so, after thinking out loud :) I need:

1) more education on implementing DTS or SSIS for use with CSV files (we have MS SQL Server 2000).
2) scheduling DTS/SSIS packages for execution as SQL Agent job.
3) a way to know which CSV files should be read next at the next scan
4) how to delete the CSVs after they are imported (or at some later date if a recommended)
5) a way to distinguish which records need to be imported and which ones need to be updated
6) how to execute the importing/updating after records are identified
** sounds like the DTS/SSIS "package" may take care of 5 & 6.

I know there are many existing questions about DTS and importing, but the information is scattered.  I wouldnt mind if part of the solution points to other quality solutions, articles, websites and best practices for solving this project.
0
Comment
Question by:newknew
3 Comments
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24025356
>>The most complicated part (for me anyway) may be trying to figure out whether I should be updating or adding new records based on the clients CSV.<<
Actually that is the wron approach.  Do not import directly into Production table(s).  Always import into a staging table and then use that to validate the data before you insert and/or update into the final tables.
0
 
LVL 30

Accepted Solution

by:
nmcdermaid earned 500 total points
ID: 24025879
1) more education on implementing DTS or SSIS for use with CSV files (we have MS SQL Server 2000).
SQL 2000 > DTS
SQL 2005 > SSIS
So you'll need to use DTS at this point. If you are considering an upgrade to SQL 2005 in the near future then I suggest you do this in SSIS on SQL 2005 as the updgrade path from DTS > SSIS is not always simple.

2) scheduling DTS/SSIS packages for execution as SQL Agent job.
You just have to keep in mind that the package runs ON the server (all paths need to be as though you are on the server), and also remember that it runs as the SQL Agent service logon user.
3) a way to know which CSV files should be read next at the next scan
Usually once you successfully import a file you move it to an archive folder. So you never import them again
4) how to delete the CSVs after they are imported (or at some later date if a recommended)
Move them to an archive folder.
5) a way to distinguish which records need to be imported and which ones need to be updated
As you have surmised, as long as the primary key is in the source file, thats pretty easy. You should definitely make use of a staging table as acperkins has suggested.
6) how to execute the importing/updating after records are identified
As acperkins has suggested, you import them all into a staging table, then compare.
 
But it is very important to note that implementing these kind of solutions (looping through source files) in DTS requires T-SQL or ActiveX scripting, and this does not migrate very well to SSIS. If you have the opportunity then I strongly suggest you upgrade to SQL 2005 and do this in SSIS.
Upgrade path is simple (backup from SQL 2000 and restore into SQL 2005). Then of course there is regression testing if you want.
Steps 3,4,5 can be fundamentally different between DTS and SSIS. i.e. for SSIS it can in some cases be practical to not use a staging table, and looping and archiving files requires a lot less work in SSIS.
Here is a sample page for looping and importing files in DTS just to give you a taste:
http://www.sqldts.com/246.aspx
It is basically a DTS 'hack'
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now