I am building a data warehouse with sales in my SQL Server 2008 database. I am feeding it with information from an Oracle 9 ERP system. I am using SSIS to do the ETL.
I would like the best strategy to approach a particular ETL problem I have to build Daily Sales.
The SQL Server destination Sales table is called tblSales. Every day I must add in the new Sales that have been generated for the previous day from the Oracle system. It is possible, however, for a Sale to 'drop in' from weeks ago which has been in some form of query state with the Finance department. There are no date flags to say when this occurs otherwise I could just look to see which recent Sales had dropped in and this would include the recent Sales. I wouldn't really have a problem as I would just fire one query over to get all of those Sales.
Long story short, I have a table of the unique TransactionNo / StoreNo combinations in both SQL Server and Oracle and created the table tblDifferentialSales. This has all of the Sales that need to be transferred across.
My problem is that the method I am using in SSIS, although it works, is very slow (about 1 second per Transaction – there could be 20,000 transactions per day to import).
1. Execute SQL Task to
select StoreNo, TransactionNo from tblDifferentialSales
this is outputs to an object Variable
2. For Loop Container, loops the object Variable in a Data Flow Task and creates the specific SQL to draw back one Sale Transaction in a Text Variable, txtSQL, which is then assigned to the [ADO NET Source].[SQLCommand].
3. This is then output to the tblSales OLE DB destination.
I wondered if it is possible to use something like the ‘Merge Join’ Transformation to get the Oracle dataset to read from the SQL server dataset as its criteria. I have tried to make this work but I am using both ADO and an OLE connections for different databases (mainly because the SQLCommand property seems to be more easily exposed than for an OLE connection). I can find examples for Merge Joins but not with two different databases. Bear in mind that the SQL Server connection may have 20,000 records in it but the Oracle ERP connection may have millions to join with.
I am open to any possibilities. What is the best way to go about this, please? Has anyone done something like this before?