Dynamic split of data in SSIS 2008

Posted on 2011-03-17
Last Modified: 2012-05-11
Hello experts,

Looking for suggestions on the best way to handle this situation. I have multiple SSIS packages for processing financial transactions. The first process has a file watcher on an FTP folder that sees a file hit and imports the transactions into a staging table for later processing. this process is event driven so it can happen multiple times in a day as files get deposited.

The second process is manually triggered by a user once conditions in our system have been met and transactions are ready to be processed. When the user triggers this process I need to grab all the transactions that have not been processed from the staging table and transform them and insert into other tables for a third party application to work with.

My problem lies here, when I grab the transactions to be processed, I need to divide them into like batches with like dates. So all buys for March 3rd in one batch, buys for March 4 in another, sells in another, etc. It doesn't appear that a conditional split will work because of the expressions needing to be hard coded. We have over 460 different transaction codes and manually creating a task process for each is not something I look forward to.

So, how would you suggest I use SSIS to take this single recordset from the staging table and divide it into other datasets each representing a group with a common date and transaction code for the entire set to be processed? I am thinking the only way I may be able to do this is with a script task, but can I dynamically create variables to hold the different datasets? Would I have a static Object variable to hold an array of dataset names and iterate through that list to call all the dynamic datasets I need to process?

Question by:Hooznext
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
LVL 37

Expert Comment

ID: 35163948
Could you explain why you need that data to get processed in those different batches as you've described? (seems a bit weird)

Author Comment

ID: 35164427
The third party application that will ultimately process these requires the transactions to be grouped into batches. Our operations department requires like transactions for each date to be grouped, so for each day there is a batch of buy transactions, a batch of sells, etc.

The third party app is a bit difficult to work with as there is no supported API, I am on my own. It would be easier for me to have a single dataset that would represent what goes into the batch so I can create the batch in the third party app, get its ID and write the transactions into the table with that batch ID, then move on to the next dataset and repeat.
LVL 22

Expert Comment

ID: 35165718
First pull the transaction codes/dates/whatever that identify the batches into an SSIS (object type) variable so that you can determine the sets to be processed.
Then use that information to drive a For Each Loop Container that processes a batch;
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.


Author Comment

ID: 35166005
8080 Diver,

That is sort of what I was thinking, use an Execute SQL Task to create an array of file dates, a for each loop through the dates that would pull the transaction codes for that date, then iterate through the codes and process the transactions. So it would be a nested loop processing each transaction.

I was wondering if there is a more efficient way to do this than to hit the database several times and use nested loops. If I could grab all the transactions at once, then get it all split into the needed datasets and dump them back. I guess I am looking for someone to help me look outside the box I am accustomed to and show me a more efficient way to do this.

Assisted Solution

Hooznext earned 0 total points
ID: 35201902
OK guys, here's my solution...let me know if anyone knows a better way to do it.

I created a stored proceedure that returns all the transactions that are not processed yet, I added two columns to the dataset 'TranGroup' and 'TranGroupTotal'. The first tells me what group the transaction belongs to, the second is the total number of groups in the recordset.

In control flow I have an Execute SQL Task to call the proc and assign the results to an Object variable. Then I go into a script task that assigns the TranGroupTotal to a Int variable in the package. That leads to a For Loop Container that loops from 1 to TranGroupTotal, inside the loop is a script task that takes the raw transactions and selects the transactions for group X of the loop and dumps them into another Object variable. Once I have the group of transactions into a dataset I go to a Data Flow and have a Script Component that uses the dataset as a source.

From there I do a normal transformation and have my transactions inserted into the database and back to the loop for the next group of transactions. Not exactly the way I thought I was going to do it, but it does prevent a bunch of calls to the database and endless nested loops as well.

I'm sure there is a better way to do this, but I don't know of it, if anyone has advice I am open to it as projects like this make me learn new things which is why I have a love/hate relationship with this stuff.
LVL 22

Expert Comment

ID: 35208441
Hooznet has basically taken my suggestion, replaced the initial SQL Query to pull the list to process with a call to a Stored Procedure and then used my suggested For Each Loop Container approach to drive the process that, because of HoozNet's additional knowledge about the data, he has then fleshed out in more detail.

Seems to me that I should at least get some pints. ;-)

Author Comment

ID: 35208634

I understand your objection, let me ask this. If I asked how to fly an airplane from New York to LA and you said "First you start the engine, take off and fly to LA then you land and shut off the engine." would you call that a helpful response?

In your suggestion, once you get the data into an Object and run a for each loop, how do you suggest you divide that single recordset into the individual sets needed to be processed. Then for that matter how do you make the Object variable available to the data flow task as a source for processing? I know now because I spent the last couple days figuring it out, I really don't see how your response benefitted me.

I had already posed splitting the data and iterating through the records to process them, how is that any different than a for each loop as you suggested?
LVL 22

Accepted Solution

8080_Diver earned 250 total points
ID: 35208879

Tell you what, I will withdraw my objection.  My assumptions were in error when I thought that you were trying to figure out how to set up the "iterate through the data" portion of your task.  I was actually providing a bit more detail addressed at your final question:
Would I have a static Object variable to hold an array of dataset names and iterate through that list to call all the dynamic datasets I need to process?

Maybe I should have quoted that question and just answered with, "Yes."

Whoever is moderating this question should feel free to grant HoozNet's request to close it as originally indicated.

As for the crack about the flying of an airplane, what I would have answered to that is that you would iniially need to file a flight plan indicating whether you intended to use one of the standard airways or to ply point-to-point and whether you would be using VFR or IFR flight rules.  Once you did that, then you should follow your flight plan.  I would have then expected some feedback that might indicate more details about what you were really trying to find out about flying from BNYC to LAX.  If I got no feedback, then I would assume that you had wondered about what you needed to do to register the fact that you were going to LAX or what your choices were about flight rules.

Author Comment

ID: 35208908
8080 Diver,

I like you...keep the dirty side down.

Author Closing Comment

ID: 35239115
You have a good sense of humor and I may not have been clear in my post. I can see where you may have thought I didn't understand how to use a For Each Loop in SSIS.

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Linked Server Issue with SQL2012 3 33
MS SQL Conditional WHERE clause 3 38
How come this XML node is not read? 3 29
Need to trim my database size 9 23
Long way back, we had to take help from third party tools in order to encrypt and decrypt data.  Gradually Microsoft understood the need for this feature and started to implement it by building functionality into SQL Server. Finally, with SQL 2008, …
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question