SSIS dataflow task - SQL Server

I have an SSIS package that is using a Data flow with a SQL command from our ERP system and loading into our Fact table in our datawarehouse. Its a query on an Order Header table with an Inner Join to a Order detail table. This is the source data and it does about 20 Lookups after that and finally a slowly Changing Dimension to see whether or not to Insert or Update into our Fact table. This process took about an hour and 45 mins. So I took the query and loaded it into a SQL table and was gonna use the SQL table as the source for the lookups and then the actual Insert but the Dataflow into the SQL table(230k rows) still takes an hour from this source query. We use Transoft to establish the connection to the source system data and it works with the ADO.NET connection but you cant do a table load you have to use a SQL Command for some reason. I want this to run better and faster so if anyone has any tips let me know. I connected to an ODBC source and was able to see the tables but I never ran it. Please help its very important I get this to run faster...

Who is Participating?
Tim HumphriesConnect With a Mentor DirectorCommented:

Couple of things :

1. Could be that you're updating your 'fact table' because it is structured as a single flattened table with all of the dimension attributes included in it. Could explain why i) you have to update it and ii) why you have >300 fields. This makes things difficult as you can't unpick the true fact table inserts from the dimension table updates. But this is speculation as I don't know your warehouse structure

2. The query you posted consists of inner joins. I would have thought that if you were testing for a rows existence you'd be more likely to do a left join to the existing data... Also as the query has none T-SQL functions it is presumably running within the ERP system. Still looks like a heterogeneous joins happening somewhere...which is always likely to be slow.

Assuming you're not going to be able to get away from the heterogeneous join I would do this.
1. I'm assuming that you cab define a unique Transaction ID for all your facts...if not already defined you should really have this.
2. As a one off task, pull all of the unique transaction IDs into a local table - you will be using this as a reference
3. When determining you inserts/updates you can left join from your new data to the Transaction ID table and identify new/existing rows that way.
4. After you've done your new inserts, add the new Transaction IDs to the local Transaction ID table.

Rainer JeschorCommented:
so the "copy" from the source system to a "blank" SQL table takes an hour?

So what is your source ERP system? Perhaps there is a faster connection provider.
Second: where is your source system and your target SQL server located? Same network / data center or is there a VPN tunnel in between ...
Having a slow network in between can also really slow down the process.
How many records and how many fields will be sent?
Tim HumphriesDirectorCommented:
Can you post your SQL query?
Difficult to advise with sight of what you're trying to do.

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

jknj72Author Commented:
Well I loaded the Order and Headr table into local SQL tables. Now I have to run a query from the ERP system and then check it against my tables and do an Insert or Update depending on if it exists or not. The query is huge(over 300 fields that Im asking them to reduce the field count). and prob wouldnt fit on this page. Its a select from the order header with an Inner Join to the details. Here is the join...


I actualy forgot I put this question out there and did another questrion so if you see another question thats very similar thats me  Thanks
Tim HumphriesDirectorCommented:

I'm still struggling as to where your data is and how you're accessing it.

1. Order Header and Details - on server 'A' but you're copying all this data over to a local SQL table (on Server 'B')
2. The ERP system is running on server 'C' and is remote/local(?) to server 'B'
3. The datawarehouse is on server '?' and is local/remote to the other servers.

From the query above it look as though you are running a remote query within the ERP system itself using functions defined within the ERP system. These are almost certainly making it impossible for any indexes to be uses to resolve the query.

First recommendation - unpick the functions to understand what they do - looks like the TSQL equivalent would be


Then write a stored procedure within the system that hosts the order data and have this only return the data that would be returned above. At least that way you only have to process the relevant data. So basically, move the determination of the data to be processed to he source rather than having to perform some sort of  'heterogeneous join' to resolve the data.

Secondly, you talk about updating your Fact table? Aren't your facts immutable? You just be inserted into a fact table...if you have slowly changing dimension they should be updated independently.
What system hosts your datawarehouse?

jknj72Author Commented:
I dont really know much about this virtual server stuff but thats what was created for me. The whole domain is VM ware(?). Im not a network guy at all but I know its all virtual. I actually ran into big permission issues because of this, with accessing drives on my SQL Server. The netowrk guy was able to fix this with a setting on his end that allows me to access my E and F drives, for storing log and data files. Everything is on the same domain though. I know this is probably causing some of the issues but like I said I dont know anything about it, just from what Ive read there are known issues. ANyway, Tim youve opened my eyes to the updating of my Fact table. Im wondering why this is happening? I should be just doing an Insert as far as Im conecerned, I dont ever recall doing an Update to a Fact table but it is. Ive ran this process and looked at the SCD and it was doing Updates and Inserts. Ive always dumped the Fact table and then did an Insert with all the data after my Lookups. The main problem I think is the amount of data which Im going to shave off a couple of years. The query that loads the data only does a load of the last 10 days though so it shouldnt be that bad. The function for the dates is a built in function the last developer used to get it to work in one of the SQL Commands which are very finicky(?) as Im sure you know. We should just be able to do a GetDate() - 10 but  this worked so I left it. I downloaded all the data into SQL so I could do some sort of Upsert into my table and then use that data to load my Fact table. Thats why I did the full load of the data. Plus its local so I figured it would save me from having to access the data across servers?
jknj72Author Commented:
This is a pic of my Data Flow. It starts off with the ADO.NET source which is a query from the ERP system with the connection configured using Transoft driver software. With the records from that query there are about 20 lookups being done, a Data conversion, Derived Column and then finally a SCD with an Insert or Update to an OleDB Destination(Fact table). I believe that the final tally of records are only a couple of thousand records that finally get inserted and maybe a couple of hundred that get updated(which I dont understand because the data shouldnt have changed but it may have). I am thinking that I can clear out the Fact table and then just do an Insert and see what happens. I am so open to suggestions and would love to hear about what you guys think I should try
jknj72Author Commented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.