How to load and archive multiple files in SSIS

Dung DinhData Platform Consultant
CERTIFIED EXPERT
Published:
Updated:
A couple of weeks ago, my client requested me to implement a SSIS package that allows them to download their files from a FTP server and archives them. Microsoft SSIS is the powerful tool which allows us to proceed multiple files at same time even we can use SSIS to download files from FTP server.

Scenario

A couple of weeks ago, my client requested me to implement a SSIS package that allows them to download their files from a FTP server and archives them.  Agency data,  Invoice data and Sales data are uploaded to FTP server everyday. They need to download them and import into their warehouse database. 

Approach

To implement this requirement, I used two main SSSI tasks which are FTP Task and Foreach Loop Container
FTP Task
: is used to download files from FTP server
Foreach Loop Container: is used to load each downloaded file


Development environment


SQL Server 2014 Developer
SQL Server Data Tools for Visual Studio 2013

Data files

I created  3 files for Agency, Invoice and Sales data. Each file has 3 columns  (Col1, Col2 and Col3)
Agency
Col1|Col3|Col3
A|1|3
Invoice
Col1|Col3|Col3
A|1|3
Sales
Col1|Col3|Col3
A|1|3
 

FTP Server

For demonstration purpose, I used DriveHQ service that supports FTP service. I created AAA_ArchivedData folder and then uploaded Agency data,  Invoice data and Sales data
I suggest you should create a free member account because you use them to configure FTP Task for next step

FTP_Server.PNG


Create staging tables


IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Agency]') AND type in (N'U'))
                      BEGIN
                      CREATE TABLE [dbo].[Agency](
                      	[Col1] [varchar](50) NULL,
                      	[Col2] [varchar](50) NULL,
                      	[Col3] [varchar](50) NULL
                      ) ON [PRIMARY]

Open in new window

 

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Invoice]') AND type in (N'U'))
                      BEGIN
                      CREATE TABLE [dbo].[Invoice](
                      	[Col1] [varchar](50) NULL,
                      	[Col2] [varchar](50) NULL,
                      	[Col3] [varchar](50) NULL
                      ) ON [PRIMARY]

Open in new window

 

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Sales]') AND type in (N'U'))
                      BEGIN
                      CREATE TABLE [dbo].[Sales](
                      	[Col1] [varchar](50) NULL,
                      	[Col2] [varchar](50) NULL,
                      	[Col3] [varchar](50) NULL
                      ) ON [PRIMARY]

Open in new window



Implement SSIS package


  1. Create a new package named Load_Multiple_CSV_Files.dtsx
  2. Create variables
    • ArchivedFolder: F:\SSIS-Practice\Load_Multiple_CSV_Files\ArchivedFolder where to store archived files
    • FileConnection
    • FileType
    • OLEDBConnection: connection string to SQL Database Server
      • Value: Data Source=HSSSC1PCL01198\SQLSERVER2014; Initial Catalog=Demo;Provider=SQLNCLI11.1;Integrated Security=SSPI;
      • Data Source: database server name
        Initial Catalog: database name
        Provider: the driver to connect to SQL Database Server. You need to change the value to meet your environment. I use SQLNCLI11.1 because SQL Server Engine is 2014. If your version is lower than SQL 2012 version, it should be 10.0 / 10.1
        Integrated Securitymeans that I use Window Authentication mode while connection to SQL Database Server.
    • SourceFolder: F:\SSIS-Practice\Load_Multiple_CSV_Files\DataFile where to store downloaded files from FTP serverSSIS_Variables.PNG
    • Create connections on Connection Manager panel
      • Create OLEDBConnection and assign OLEDBConnection variable to ConnectionString property
      • Create 3 connections for CSV files ( Agency, Invoice and Sales) and assign FileConnection variable to ConnectionString property
      • Create ArchivedFolder connection and assign ArchivedFolder variable to ConnectionString property
        SSIS_Multiple_CSV_Connection.PNG
  3. Drag and drop FTP Task component  

    1. Double click to open FTP Task Editor
      • Create new FTP connection to open FTP Connection Manager Editor
      • Server settings
        • Server name: ftp.drivehq.com
        • Server port: 21 by default. The FTP site listens to this port. If you us another port, you need to make sure that it is not blocked by firewall.
      • Credentials
        • User name: the account that you registered with DriveHQ
        • Password: the password that you registered with DriveHQ
      • Options
        • Check on Use Passive mode: it means that the client side initiates the data connection. If you un-check, it means you use Active mode. In Active mode, the server side initiates the data connection and it is DriveHQ in this case.
    2. Click on Test Connection to make sure you can connect to DriveHQ successfullyFTP_Task_Editor_1.PNG
    3. Click OK to come back FTP Task Editor
      1. Go to File Transfer tab
        • Local Parameters
          • IsLocalPathVariable: True
          • LocalVariable: User::SourceFolder. We specify where to download data files from DriveHQ
        • Operation
          • Operation: Receive files. There are a lot of actions such as Send files, Receive files, Create local directory,... In this case, we want to download the files so we use Receive files
          • IsTransferAscii: True accepts Ascii code
        • Remote Parameters
          • IsRemotePathVariable: False
          • RemotePath: /AAA_ArchivedData/*.txt. The option specifies where we download data files on DriveHQ. The AAA_ArchivedData is created above step and I also uploaded the testing files. *.txt specifies that we download all files with txt extension
      2. Click OK
  4. Drag and drop Execute SQL  Task component 

    • This task is to truncate the tables that created before
  5. Drag and drop Foreach Loop Container component 

    1. Double click to open Foreach Loop Editor
      • Go to Collection tab
        • Enumerator: Foreach File Enumerator
        • Add expression Directory: @[User::SourceFolder]
        • Enumerator Configuration
          • Files: *.txt it means to load text files only
          • Check on Fully qualified
        • SSIS_Multiple_CSV_ForeachLoop_1.PNG
      • Go to Variable Mappings
        • User::FileConnection
        • Index = 0
      • Click OK
    2. Drag and drop Script Task component into Foreach Loop and name it as Parse File Name
      1. This task is used to detect what kind of file type (Agency, Invoice or Sales)
      2. Double click to open Script Task Editor
        • ReadOnlyVariables: User::FileConnection
        • ReadWriteVariables: User::FileType
      3. Click Edit script... and enter the following code
        public void Main() { // TODO: Add your code here string filepath = Dts.Variables["FileConnection"].Value.ToString(); string filename = Path.GetFileName(filepath); if (filename.ToUpper().Contains("INVOICE")) { Dts.Variables["FileType"].Value = "Invoice"; } else if ((filename.ToUpper().Contains("AGENCY"))) { Dts.Variables["FileType"].Value = "Agency"; } else if ((filename.ToUpper().Contains("SALES"))) { Dts.Variables["FileType"].Value = "Sales"; } else Dts.Variables["FileType"].Value = "Unknown"; Dts.TaskResult = (int)ScriptResults.Success; }
      4. Click OK
    3. Drag and drop Data Flow Task component and name it as Load Invoice
      • Make a connection between Parse File Name script task and Load Invoice
      • ​​Double click on the connection to open Precedence Constraint Editor
        • Evaluation operation: Expression
        • Expression: @[User::FileType]=="Invoice"
        • Click on Logical OR....SSIS_Multiple_CSV_PrecedenceConstrai.PNG
        • Click OK
      • Double click on Load Invoice data flow
        • Drag and drop Flat File Source component and name it as Invoice File
          • Flat file connection manager: Invoice
          • Columns: all columns
        • Drag and drop OLEDB Destination and name is as Invoice Destination
          • OLEDB connection manger: OLEDBConnection
          • Data access view: Table and view - fast load
          • Name of the table or the view: Invoice
          • Mapping with flat file sourceSSIS_Multiple_CSV_LoadInvoice.PNG
    4. Drag and drop Data Flow Task component and name it as Load Agency
      • Repeat all steps which we implement for Invoice but change Expression of Precedence Constraint Editor to @[User::FileType]=="AgencySSIS_Multiple_CSV_PrecedenceConstrai.PNGSSIS_Multiple_CSV_LoadAgency.PNG
    5. SSIS_Multiple_CSV_LoadAgency.PNGDrag and drop Data Flow Task component and name it as Load Sales
      • Repeat all steps which we implement for Invoice but change Expression of Precedence Constraint Editor to @[User::FileType]=="Sales" ​SSIS_Multiple_CSV_PrecedenceConstrai.PNGSSIS_Multiple_CSV_LoadSales.PNG
    6. Drag and drop File System Task component
      • Double click to open File System Task Editor
        • IsDestinationPathVariable: False
        • DestinationConnection: ArchivedFolder. This is the folder connection we created before
        • OverwriteDestination: False
        • Operation: Move File. There are a lot of actions but we use Move File to archive data files after proceeding them successfully
        • IsSourcePathVariable: True
        • SourceConnection: User::FileConnection


Execute SSIS package

Finally, our package has been done and we need to execute it to see how it works?
SSIS_Multiple_CSV.PNG
  1. Check the folder F:\SSIS-Practice\Load_Multiple_CSV_Files\DataFile to make sure there are no data files
  2. Check the folder F:\SSIS-Practice\Load_Multiple_CSV_Files\ArchivedFolder to make sure there are no data files
  3. Query data from Agency, Invoice and Sales tables to make sure that they are empty
SSIS_Multiple_CSV_Before_Executing.PNG
Executing the package
SSIS_Multiple_CSV_Executing.PNGVerifying the result after executing the package
SSIS_Multiple_CSV_Results.PNGExcellent, data imported and data files were also archived to the folder as our expectation. In my example, the archived folder is local, you can still use FTP Task component to upload the files to anywhere.


Conclusions

In real life, there are a lot of scenarios that will require you to combine SSIS components together. Hopefully, you will have an overview how to use FTP Task and Foreach Loop Container to proceed multiple files at same time through my example.
0
9,784 Views
Dung DinhData Platform Consultant
CERTIFIED EXPERT

Comments (2)

Commented:
Hi,

I have the same scenario, i followed all the steps and it executed excellent. Only one thing i missed is  i'm not getting the Archive files with dates
while moving from source to Archive folder. Will you please help me to get the right thing how you got the Filename with date in Archive folder.

Thanks a lot in advance
Dung DinhData Platform Consultant
CERTIFIED EXPERT

Author

Commented:
Hi,

Can you describe more detail about your expectation? Do you mean you want to add DATE to name of files before moving them to Archive folder?

Cheers,

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.