• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1023
  • Last Modified:

Design question using SSIS

SSIS Design Question

I have a Stored Procedure that I am running via SSIS that does a Distributed Transaction.  It inserts and updates data one on Server1.  Based on that update/insert, it creates a temp table.  The temp table is used within the update statement to update table on Server2 via a Linked Server.   The problem is that the update via the Linked Server is extremely slow.

I am thinking of breaking it down via SSIS so that each SQL runs on each server which would be much faster.  This seems like it would work but I am just learning SSIS.  So if I have an EXECUTE SQL that executes a SP or SQL which will do the inserting and updating data one Server1, and then if that is a success I proceed to the next step.  

The issue is within the one SP I was using a Temp table that holds the keys that will be the input parameter of the SQL that is executed on the Server2.  A temp table will not work since the temp table will not be available to both servers.  So how can I hold the data, some sort of table, from one Execute SQL as the input to another Execute SQL?

Table value parameters are read-only and input only.  I could put it in a physical table and then delete all rows in the table when all tasks have completed.  Since I am new to SSIS, do you have any other ideas that you can think of in storing the data from one insert/update on Server 1 as input to Server2?

Thanks!
0
Bodhi108
Asked:
Bodhi108
  • 6
  • 6
2 Solutions
 
Rainer JeschorCommented:
Hi,
I would let the first sproc return the temp table records to ssis hence using an oledb source.
Then you can simply run your update using oledb target on server 2.

HTH
Rainer
0
 
Bodhi108Author Commented:
I don't believe you can output a temp table from a stored procedure.  Do you know of a way?

Abbi
0
 
Rainer JeschorCommented:
Hi,
you can not return a table variable to SSIS but a SELECT on the temp table.

A brief sample
create procedure
  <procedurename>
  <designTime>	int,
  <var2>	uniqueidentifier
as
  begin
    SET FMTONLY OFF		-- Result parsing
    SET NOCOUNT ON		-- Avoid unnecessary output

    declare <variables>

    create table #TmpResult	-- result set
    (
        <columns> [....]
    )

    if <designTime>=1	-- Design-time Select
      begin
        SELECT *  FROM #TmpResult
        return
      end

    -- -----------------------------------------------------
    -- None-design time standard processing
    -- -----------------------------------------------------
    -- [YOUR CODE GOES HERE]

    SELECT *  FROM #TmpResult
end

Open in new window


Best practice:
http://www.ssistalk.com/2007/10/10/ssis-stored-procedures-and-the-ole-db-source/

HTH
Rainer
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Bodhi108Author Commented:
Yes, that makes sense.  Being new to SSiS, how do I capture the table in SSIS?  It wouldn't be an output variable in the Execute SQL, would it?  

This is an additional question...  can I put a commit transaction around 2 Execute SQL statements in SSIS?  If so, do I take out the commits in the Stored Procedure?

Thanks, Abbi
0
 
Rainer JeschorCommented:
Hi,
SSIS data flow is normally processing results row by row hence the result set of your stored procedure will later be passed to further processing tasks and the final output.

A good starting point in regards to transaction and SSIS is
http://www.mssqltips.com/sqlservertip/1585/how-to-use-transactions-in-sql-server-integration-services-ssis/

I normally use SSIS in regards to Data Warehousing hence I am not using transactions.

HTH
Rainer
0
 
Bodhi108Author Commented:
Hi,

I'm still working on passing the temp table from SP1 to SP2:

I created an object variable in SSIS to hold the table.

The first step of the Control Flow is "Execute SQL Task" which executes SP1 and the results set gets stored in the user defined variable.  Data Type is Object.

The second step of the Control Flow is "Data Flow Task" which executes "OLE DB Source" which executes SP2 (exec UpdateFactProductPurchaseDateReplicated ?) using the object variable as the input parameter.

The only way I know where a Stored Procedure can use a table as an input parameter is to create a user defined type variable.  The parameter must be READONLY which is defined in SP2.  SP2 seems to execute with no errors within SSMS but when I run it within SSIS I get the following error:

"The table-value parameter "@P1" must be declared with the "READONLY" option."

Any ideas here?

Thanks!
Abbi
0
 
Rainer JeschorCommented:
Hi,
if possible I would recommend that you just have one Data Flow which will have one OleDB Source (SQL01) which will execute the stored procedure (which internally makes the CRUD actions on SQL01 and which returns the results from the temp table). Then you have an OleDB destination (SQL02) which will then make the CRUD record-by-record.

Thats the primary target for SSIS (processing row by row in batches parallel).

HTH
Rainer
0
 
Bodhi108Author Commented:
Hi Rainer,

Thanks.  Perhaps because I am just learning SSIS, I am still not clear.

For OleDB Source (SQL01) do I need to store the results in a variable?

For OleDB Source(SQL02) are you saying not to call a Stored Procedure but use inline T-SQL?

Thanks,
Abbi
0
 
Rainer JeschorCommented:
Hi,
no problem - SSIS is not a no-brainer.
You do not need to store the result in a variable because the return of the stored procedure are rows (of records) which will then be forwarded to the next dataflow component. Normally thats done in so called batches (e.g. 5000 records) meaning that if you return 15000 rows the first 5000 will be already sent to the next data flow component whereas the other records are still transfered.
For your target that depends what you have to do with them. If it is just an insert then you can easly use the Oledb target on a table and do the graphical mapping. If there is more logic then you can use the stored procedure and set parameters which will then be executed record by record.
A couple of questions:
How many records are we talking about?
What is the "general" logic beyond the second target (e.g. calculations, lookups, foreign key replacements,...)
How many columns does the result of the source sproc return?

Thanks and HTH
Rainer
0
 
Bodhi108Author Commented:
Hi Rainer,

This is a replication process which will be replicating 22 tables within mini-data warehouses to a centralized data warehouse.  Each replication will have this process that we are talking about.

Therefore, the source sproc returns anywhere from 22K rows to 8 million rows depending on the table I am replicating.  The first time I process it  could be as large 8 million but subsequently it depends on the number of updates that have occurred during 1 day, which I believe may be as large 2-3 million.  

The result will only have one column which will contain the primary key.  The second sproc does an update to a date column indicating that the row has been replicated into the centralized data warehouse.  The second sproc also has a logging table that does an insert with date/time and the number of rows that have been updated.  That could be a second dataflow component.

So I understand that by using a dataflow component it does not need to be stored in a variable.  Do you think processing 3 million records is too much to do one by one?  The great thing about a stored procedure is that it is set processing which is very fast.   I was planning on having about 6 of these going at once since Central Warehouse has 6 processors.

Thanks,
Abbi
0
 
Bodhi108Author Commented:
One more thing...  I was able to get this whole process to work within SSIS by using physical tables to hold the primary key and having 3 execute SQL tasks.

The first Execute SQL does the inserts on server A and inserts the primary key into the physical table.  It also logs the counts into a logging table for that particular table on server A.

The second Execute SQL reads the table on server A containing the primary key and updates the replication date on server B.  It also logs the counts into a logging table for that particular table on server B.

The third Execute SQL deletes the row in the table on server A containing the primary key.

This works but there are a lot of physical tables.  And I have a new question out there regarding commits/rollbacks across two different connection managers.  Not sure if this is possible.

Thanks,
Abbi
0
 
Rainer JeschorCommented:
Hi,
I could think of having only one table for the Primary Keys with an additional column (int or varchar) holding the related table name / ID. Then you can store all keys in one table.

In regards to transaction processing I am not sure as well.

HTH
Rainer
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 6
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now