Dynamic split of data in SSIS 2008

Hello experts,

Looking for suggestions on the best way to handle this situation. I have multiple SSIS packages for processing financial transactions. The first process has a file watcher on an FTP folder that sees a file hit and imports the transactions into a staging table for later processing. this process is event driven so it can happen multiple times in a day as files get deposited.

The second process is manually triggered by a user once conditions in our system have been met and transactions are ready to be processed. When the user triggers this process I need to grab all the transactions that have not been processed from the staging table and transform them and insert into other tables for a third party application to work with.

My problem lies here, when I grab the transactions to be processed, I need to divide them into like batches with like dates. So all buys for March 3rd in one batch, buys for March 4 in another, sells in another, etc. It doesn't appear that a conditional split will work because of the expressions needing to be hard coded. We have over 460 different transaction codes and manually creating a task process for each is not something I look forward to.

So, how would you suggest I use SSIS to take this single recordset from the staging table and divide it into other datasets each representing a group with a common date and transaction code for the entire set to be processed? I am thinking the only way I may be able to do this is with a script task, but can I dynamically create variables to hold the different datasets? Would I have a static Object variable to hold an array of dataset names and iterate through that list to call all the dynamic datasets I need to process?

Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

8080_DiverConnect With a Mentor Commented:

Tell you what, I will withdraw my objection.  My assumptions were in error when I thought that you were trying to figure out how to set up the "iterate through the data" portion of your task.  I was actually providing a bit more detail addressed at your final question:
Would I have a static Object variable to hold an array of dataset names and iterate through that list to call all the dynamic datasets I need to process?

Maybe I should have quoted that question and just answered with, "Yes."

Whoever is moderating this question should feel free to grant HoozNet's request to close it as originally indicated.

As for the crack about the flying of an airplane, what I would have answered to that is that you would iniially need to file a flight plan indicating whether you intended to use one of the standard airways or to ply point-to-point and whether you would be using VFR or IFR flight rules.  Once you did that, then you should follow your flight plan.  I would have then expected some feedback that might indicate more details about what you were really trying to find out about flying from BNYC to LAX.  If I got no feedback, then I would assume that you had wondered about what you needed to do to register the fact that you were going to LAX or what your choices were about flight rules.
ValentinoVBI ConsultantCommented:
Could you explain why you need that data to get processed in those different batches as you've described? (seems a bit weird)
HooznextAuthor Commented:
The third party application that will ultimately process these requires the transactions to be grouped into batches. Our operations department requires like transactions for each date to be grouped, so for each day there is a batch of buy transactions, a batch of sells, etc.

The third party app is a bit difficult to work with as there is no supported API, I am on my own. It would be easier for me to have a single dataset that would represent what goes into the batch so I can create the batch in the third party app, get its ID and write the transactions into the table with that batch ID, then move on to the next dataset and repeat.
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

First pull the transaction codes/dates/whatever that identify the batches into an SSIS (object type) variable so that you can determine the sets to be processed.
Then use that information to drive a For Each Loop Container that processes a batch;
HooznextAuthor Commented:
8080 Diver,

That is sort of what I was thinking, use an Execute SQL Task to create an array of file dates, a for each loop through the dates that would pull the transaction codes for that date, then iterate through the codes and process the transactions. So it would be a nested loop processing each transaction.

I was wondering if there is a more efficient way to do this than to hit the database several times and use nested loops. If I could grab all the transactions at once, then get it all split into the needed datasets and dump them back. I guess I am looking for someone to help me look outside the box I am accustomed to and show me a more efficient way to do this.
HooznextConnect With a Mentor Author Commented:
OK guys, here's my solution...let me know if anyone knows a better way to do it.

I created a stored proceedure that returns all the transactions that are not processed yet, I added two columns to the dataset 'TranGroup' and 'TranGroupTotal'. The first tells me what group the transaction belongs to, the second is the total number of groups in the recordset.

In control flow I have an Execute SQL Task to call the proc and assign the results to an Object variable. Then I go into a script task that assigns the TranGroupTotal to a Int variable in the package. That leads to a For Loop Container that loops from 1 to TranGroupTotal, inside the loop is a script task that takes the raw transactions and selects the transactions for group X of the loop and dumps them into another Object variable. Once I have the group of transactions into a dataset I go to a Data Flow and have a Script Component that uses the dataset as a source.

From there I do a normal transformation and have my transactions inserted into the database and back to the loop for the next group of transactions. Not exactly the way I thought I was going to do it, but it does prevent a bunch of calls to the database and endless nested loops as well.

I'm sure there is a better way to do this, but I don't know of it, if anyone has advice I am open to it as projects like this make me learn new things which is why I have a love/hate relationship with this stuff.
Hooznet has basically taken my suggestion, replaced the initial SQL Query to pull the list to process with a call to a Stored Procedure and then used my suggested For Each Loop Container approach to drive the process that, because of HoozNet's additional knowledge about the data, he has then fleshed out in more detail.

Seems to me that I should at least get some pints. ;-)
HooznextAuthor Commented:

I understand your objection, let me ask this. If I asked how to fly an airplane from New York to LA and you said "First you start the engine, take off and fly to LA then you land and shut off the engine." would you call that a helpful response?

In your suggestion, once you get the data into an Object and run a for each loop, how do you suggest you divide that single recordset into the individual sets needed to be processed. Then for that matter how do you make the Object variable available to the data flow task as a source for processing? I know now because I spent the last couple days figuring it out, I really don't see how your response benefitted me.

I had already posed splitting the data and iterating through the records to process them, how is that any different than a for each loop as you suggested?
HooznextAuthor Commented:
8080 Diver,

I like you...keep the dirty side down.
HooznextAuthor Commented:
You have a good sense of humor and I may not have been clear in my post. I can see where you may have thought I didn't understand how to use a For Each Loop in SSIS.
All Courses

From novice to tech pro — start learning today.