SSIS Dynamically load tables with different schemas (DB2 to SQL Server)

I've set up a SSIS control flow that selects the following columns from a table into a system.object:  db2SrcQuery, destTblName

A loop then maps variables via the Foreach ADO Enumerator:
User::db2SrcQuery / User::destTblName

Inside the loop a data flow task connects to DB2 and executes the User::db2SrcQuery's select statement's result into the destination table (User::destTblName).

All works great except for the big showstopper of table schema changes.

Found a blog with a nice solution for this except that it uses OpenQuery -- something the client stands firm on not using.  I need to stay with connection managers for the db2 extraction - also, they don't want files so bulk insert is not an option.

So...

Are there any great suggestions out there on how to do this in SSIS?  

1. Foreach ADO Enumerator collection / variable mapping of sql query & dest table name
    a. connect to db2 via connection manager and exec sql variable command
    b. insert result into destination table (via table name variable) -- table schemas differ

Thank you!
ssebringAsked:
Who is Participating?
 
8080_DiverCommented:
As I have often told my clients when I was doing independent consulting, if one puts enough constraints on a problem, one can constrain away all possible solutions. ;-)
Let me see if I have the constraints correctly:
  • You cannot link the SQL Server database to the DB2 databse; thus precluding the use of the INSERT INTO (...) SELECT ... FROM ... WHERE... technique;
  • You cannot receive files of any sort from the DB2; thus precluding the exporting and importing of flatfiles via SSIS;
Okay, I have one last go at this.  Suppose you do the following:
  1. Have the export query for the data from the DB2 configured so that it, in effect, creates a CSV record that you stuff into a staging table in the SQL Server database.  That staging table would consist of an Identity column and one big VarChar(4000) (or whatever size is the biggest needed) column.  This would allow you to have the DB2 SELECT query in the control table and always target the one staging table.
  2. The Import stage would be done via a SQL statement of the INSERT INTO [table] ([list of columns]) SELECT . . . FROM STAGINGTABLE sort where the SELECT part breaks out the data from the one big VarChar column in the staging table.
  3. TRUNCATE the staging table after completing the importing of the data.
This can all be put in a FOR EACH loop and controlled via variables that are populated from the control table.
Now, if your constraints preclude having a staging table, I think you are pretty much SOL.  However, I would argue that you would be using the staging table to allow you to scrub the data and do some potential reformating of things (like dates that are represented as integers, especially if they are of the CYYMMDD variety ;-) and, therefore, you actually need to pass it through the staging table anyway.
0
 
8080_DiverCommented:
If you have3 a known set of DB2 tables to process (which I would definitely hope you do ;-), then you could create a DB2ImportCotrols table that has, among other pertinent pieces of controling information, the connection information (especially if you have to hit various LIBs in teh DB2 world), the DB2 source file name, and the target table name (including Schema).  Then, you could cycle through this control table to do the processing.
By having the connection information, you could compare the current connection string to the one from the table each time you read a new control row and only change the connection if the strings are different.  The control table would also provide the full schame.filename information for the SQL Server side.  In fact, you could even go so far as to include the SQL Server connection manager's connection string so that, should the need arise, you could handle multiple SQL Server database connections. ;-)
I do this with a process I set up and I include the last datetime that a particular control row's processing was successful.
0
 
ssebringAuthor Commented:
Sorry, but I understood little of what you were describing.

1. Yes, the DB2 tables are known.

2. Regarding creating a db2 control table: at this time, the dev team does not have db2 create table rights but it's possible to request it if the solution can be proven worthy enough.

3. Not sure what you mean about hitting various LIBS in the DB2 world.  What we need to do is a db2 select query from several joined tables and output the result straight into sql server.  The client doesn't want to use files & I'm not sure what you mean about providing a full schema.filename -- do you mean a table definition?


Please clarify and thank you for your reply!
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
8080_DiverCommented:
1. Yes, the DB2 tables are known.
Great!

2. Regarding creating a db2 control table: at this time, the dev team does not have db2 create table rights but it's possible to request it if the solution can be proven worthy enough.
I was not talking about creating a table in the DB2 world but, rather, in the SQL Server world.  Since you are using SSIS, I rather assumed that you were working from the SQL Server side of teh equation.

3. Not sure what you mean about hitting various LIBS in the DB2 world.  
Ask your DB2 expert what  LIB is and that pperson doesn't know, then you had best find someone who does! ;-)  A LIB is a means by which databases are more or less partitioned in DB2.

What we need to do is a db2 select query from several joined tables and output the result straight into sql server.  The client doesn't want to use files & I'm not sure what you mean about providing a full schema.filename -- do you mean a table definition?
DB2 databases are not physically constructed in the same way that a SQL Server database is.  DB2 Databases are actually files that contain PK and FK logical links . . . and that is where my reference to files comes in.  
Instead of "full schema.filename" I probably wshould have said "full schema.tablename"  (when I am dealing with DB2 related questions, I tend to blur the distinction between a table and a file because DB2 developers treat them as the same thing. ;-).
So, what I am saying is that I would create a SQL SERVER table that would contain:
  • TableProcessingID INT Identity (1-1);
  • TargetTableName VarChar(200) -- the 4-part name for the SQL Server table to receive the data;
  • SQLServerConnectionString VarChar(400) -- whatever connection string is required to access the SQL Server database that contains the table;
  • DB2ConnectionString VarChar(400) -- whatever connection string is required to access the DB2 database that contains the table;
  • SQLStatement VarChar(8000) -- the DB2 SQL statement to extract whatever data from the DB2 world and insert it into the SQL Server table;
  • LastProcessed DateTime -- the datetime of the last successful processing of this data.
Given that information, your SSIS package should be able to use a query to pull the data from that control table, perform a FOR EACH that uses the data from that table to set all the connections and everything else that you need to perform the ETL process.
0
 
ssebringAuthor Commented:
thank you for your time in another reply.

Your last description seems to be almost exact to what I already have setup and described in my initial posted question.  

I've got a loop iterating on each var containing the db2 query string -- the problem I face is that when a table def changes, ssis doesn't dynamically reset the column mappings in the data flow task, so it fails.  

What am I missing in your description that would make this work?  

My current task flow is:

1. select target table name and db2 query from sql server control table into vars
2. inside loop:
         data flow task
            - ole db db2 source - Data access mode: sql command from variable
            - ole db sql server destination - Data access mode Table name or view name variable
                 (here is where the column mappings happen on the initial setup, on next iteration the
                   column mapping does not dynamically get reset by ssis, hence the fail if the table columns
                   are different)

Thanks for yet another round of this question.
0
 
8080_DiverCommented:
Well, essentially, I was figuring on using a single Control Flow component (Execute SQL Task) to handle the execution of the INSERT INTO ( . . . ) SELECT . . . .  FROM table WHERE constraints SQL statement.  That effectively by-passes the use of the components that yoyu are using in the dataflow.  
An Execute SQL Task can use a variable (the SQLStatement from my control table), among other options, for providing the SQL statement to execute.
0
 
ssebringAuthor Commented:
that would have been great, something I thought more than once about in fact, but the statement would have to use an openquery/linkedserver which the client is against.  Unless I'm missing something here?
0
 
ssebringAuthor Commented:
thank you for your comments!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.