Solved

Scheduled importing of CSV file using DTS or SSIS

Posted on 2009-03-30
3
1,023 Views
Last Modified: 2013-11-30
I have a business partner with FTP access to our server.  He has his system set to automatically FTP a CSV data file to our server.  I need to read this file periodically and import or update his existing records per the CSV file.  New files will be FTP'd every day for example and will have a unique name of some sort.  I don't think this is a complicated project as the application is very much just a web based (dotnet),  15 field data storage and recall application.

The most complicated part (for me anyway) may be trying to figure out whether I should be updating or adding new records based on the clients CSV.  Client will be sending their PrimaryKey with their records so I will know if their records already exist, will just need to write the code for that (is this where the DTS/SSIS part comes in?)

... so, after thinking out loud :) I need:

1) more education on implementing DTS or SSIS for use with CSV files (we have MS SQL Server 2000).
2) scheduling DTS/SSIS packages for execution as SQL Agent job.
3) a way to know which CSV files should be read next at the next scan
4) how to delete the CSVs after they are imported (or at some later date if a recommended)
5) a way to distinguish which records need to be imported and which ones need to be updated
6) how to execute the importing/updating after records are identified
** sounds like the DTS/SSIS "package" may take care of 5 & 6.

I know there are many existing questions about DTS and importing, but the information is scattered.  I wouldnt mind if part of the solution points to other quality solutions, articles, websites and best practices for solving this project.
0
Comment
Question by:newknew
3 Comments
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24025356
>>The most complicated part (for me anyway) may be trying to figure out whether I should be updating or adding new records based on the clients CSV.<<
Actually that is the wron approach.  Do not import directly into Production table(s).  Always import into a staging table and then use that to validate the data before you insert and/or update into the final tables.
0
 
LVL 30

Accepted Solution

by:
nmcdermaid earned 500 total points
ID: 24025879
1) more education on implementing DTS or SSIS for use with CSV files (we have MS SQL Server 2000).
SQL 2000 > DTS
SQL 2005 > SSIS
So you'll need to use DTS at this point. If you are considering an upgrade to SQL 2005 in the near future then I suggest you do this in SSIS on SQL 2005 as the updgrade path from DTS > SSIS is not always simple.

2) scheduling DTS/SSIS packages for execution as SQL Agent job.
You just have to keep in mind that the package runs ON the server (all paths need to be as though you are on the server), and also remember that it runs as the SQL Agent service logon user.
3) a way to know which CSV files should be read next at the next scan
Usually once you successfully import a file you move it to an archive folder. So you never import them again
4) how to delete the CSVs after they are imported (or at some later date if a recommended)
Move them to an archive folder.
5) a way to distinguish which records need to be imported and which ones need to be updated
As you have surmised, as long as the primary key is in the source file, thats pretty easy. You should definitely make use of a staging table as acperkins has suggested.
6) how to execute the importing/updating after records are identified
As acperkins has suggested, you import them all into a staging table, then compare.
 
But it is very important to note that implementing these kind of solutions (looping through source files) in DTS requires T-SQL or ActiveX scripting, and this does not migrate very well to SSIS. If you have the opportunity then I strongly suggest you upgrade to SQL 2005 and do this in SSIS.
Upgrade path is simple (backup from SQL 2000 and restore into SQL 2005). Then of course there is regression testing if you want.
Steps 3,4,5 can be fundamentally different between DTS and SSIS. i.e. for SSIS it can in some cases be practical to not use a staging table, and looping and archiving files requires a lot less work in SSIS.
Here is a sample page for looping and importing files in DTS just to give you a taste:
http://www.sqldts.com/246.aspx
It is basically a DTS 'hack'
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now