Solved

SSIS dataflow task - SQL Server

Posted on 2015-02-05
8
110 Views
Last Modified: 2016-02-15
I have an SSIS package that is using a Data flow with a SQL command from our ERP system and loading into our Fact table in our datawarehouse. Its a query on an Order Header table with an Inner Join to a Order detail table. This is the source data and it does about 20 Lookups after that and finally a slowly Changing Dimension to see whether or not to Insert or Update into our Fact table. This process took about an hour and 45 mins. So I took the query and loaded it into a SQL table and was gonna use the SQL table as the source for the lookups and then the actual Insert but the Dataflow into the SQL table(230k rows) still takes an hour from this source query. We use Transoft to establish the connection to the source system data and it works with the ADO.NET connection but you cant do a table load you have to use a SQL Command for some reason. I want this to run better and faster so if anyone has any tips let me know. I connected to an ODBC source and was able to see the tables but I never ran it. Please help its very important I get this to run faster...


Thanks
JK
0
Comment
Question by:jknj72
  • 4
  • 3
8 Comments
 
LVL 44

Expert Comment

by:Rainer Jeschor
Comment Utility
Hi,
so the "copy" from the source system to a "blank" SQL table takes an hour?

So what is your source ERP system? Perhaps there is a faster connection provider.
Second: where is your source system and your target SQL server located? Same network / data center or is there a VPN tunnel in between ...
Having a slow network in between can also really slow down the process.
How many records and how many fields will be sent?
Thanks.
Rainer
0
 
LVL 15

Expert Comment

by:Tim Humphries
Comment Utility
Can you post your SQL query?
Difficult to advise with sight of what you're trying to do.

Tim
0
 

Author Comment

by:jknj72
Comment Utility
Well I loaded the Order and Headr table into local SQL tables. Now I have to run a query from the ERP system and then check it against my tables and do an Insert or Update depending on if it exists or not. The query is huge(over 300 fields that Im asking them to reduce the field count). and prob wouldnt fit on this page. Its a select from the order header with an Inner Join to the details. Here is the join...

FROM            OEH_HDR INNER JOIN
                         OED_DTL ON OEH_HDR.OEHDR_DOCUMENT_NUMBER = OED_DTL.OEHDR_DOCUMENT_NUMBER AND
                         OEH_HDR.OEHDR_TYPE_OF_ENTRY = OED_DTL.OEHDR_TYPE_OF_ENTRY AND OEH_HDR.WAREHOUSE_CODE = OED_DTL.WAREHOUSE_CODE
WHERE     ({ fn TIMESTAMPDIFF(SQL_TSI_DAY, OEH_HDR.OEHDR_ENTRY_DATE, { fn CURDATE() }) } <= 10)

I actualy forgot I put this question out there and did another questrion so if you see another question thats very similar thats me  Thanks
0
 
LVL 15

Expert Comment

by:Tim Humphries
Comment Utility
OK,

I'm still struggling as to where your data is and how you're accessing it.

1. Order Header and Details - on server 'A' but you're copying all this data over to a local SQL table (on Server 'B')
2. The ERP system is running on server 'C' and is remote/local(?) to server 'B'
3. The datawarehouse is on server '?' and is local/remote to the other servers.

From the query above it look as though you are running a remote query within the ERP system itself using functions defined within the ERP system. These are almost certainly making it impossible for any indexes to be uses to resolve the query.

First recommendation - unpick the functions to understand what they do - looks like the TSQL equivalent would be

WHERE DateDiff(d,  SQL_TSI_DAY, OEH_HDR.OEHDR_ENTRY_DATE) <=10

Then write a stored procedure within the system that hosts the order data and have this only return the data that would be returned above. At least that way you only have to process the relevant data. So basically, move the determination of the data to be processed to he source rather than having to perform some sort of  'heterogeneous join' to resolve the data.

Secondly, you talk about updating your Fact table? Aren't your facts immutable? You just be inserted into a fact table...if you have slowly changing dimension they should be updated independently.
What system hosts your datawarehouse?

Tim
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:jknj72
Comment Utility
I dont really know much about this virtual server stuff but thats what was created for me. The whole domain is VM ware(?). Im not a network guy at all but I know its all virtual. I actually ran into big permission issues because of this, with accessing drives on my SQL Server. The netowrk guy was able to fix this with a setting on his end that allows me to access my E and F drives, for storing log and data files. Everything is on the same domain though. I know this is probably causing some of the issues but like I said I dont know anything about it, just from what Ive read there are known issues. ANyway, Tim youve opened my eyes to the updating of my Fact table. Im wondering why this is happening? I should be just doing an Insert as far as Im conecerned, I dont ever recall doing an Update to a Fact table but it is. Ive ran this process and looked at the SCD and it was doing Updates and Inserts. Ive always dumped the Fact table and then did an Insert with all the data after my Lookups. The main problem I think is the amount of data which Im going to shave off a couple of years. The query that loads the data only does a load of the last 10 days though so it shouldnt be that bad. The function for the dates is a built in function the last developer used to get it to work in one of the SQL Commands which are very finicky(?) as Im sure you know. We should just be able to do a GetDate() - 10 but  this worked so I left it. I downloaded all the data into SQL so I could do some sort of Upsert into my table and then use that data to load my Fact table. Thats why I did the full load of the data. Plus its local so I figured it would save me from having to access the data across servers?
0
 

Author Comment

by:jknj72
Comment Utility
This is a pic of my Data Flow. It starts off with the ADO.NET source which is a query from the ERP system with the connection configured using Transoft driver software. With the records from that query there are about 20 lookups being done, a Data conversion, Derived Column and then finally a SCD with an Insert or Update to an OleDB Destination(Fact table). I believe that the final tally of records are only a couple of thousand records that finally get inserted and maybe a couple of hundred that get updated(which I dont understand because the data shouldnt have changed but it may have). I am thinking that I can clear out the Fact table and then just do an Insert and see what happens. I am so open to suggestions and would love to hear about what you guys think I should try
FactSalesOrder-DataFlow.png
0
 
LVL 15

Accepted Solution

by:
Tim Humphries earned 500 total points
Comment Utility
Hi,

Couple of things :

1. Could be that you're updating your 'fact table' because it is structured as a single flattened table with all of the dimension attributes included in it. Could explain why i) you have to update it and ii) why you have >300 fields. This makes things difficult as you can't unpick the true fact table inserts from the dimension table updates. But this is speculation as I don't know your warehouse structure

2. The query you posted consists of inner joins. I would have thought that if you were testing for a rows existence you'd be more likely to do a left join to the existing data... Also as the query has none T-SQL functions it is presumably running within the ERP system. Still looks like a heterogeneous joins happening somewhere...which is always likely to be slow.

Assuming you're not going to be able to get away from the heterogeneous join I would do this.
1. I'm assuming that you cab define a unique Transaction ID for all your facts...if not already defined you should really have this.
2. As a one off task, pull all of the unique transaction IDs into a local table - you will be using this as a reference
3. When determining you inserts/updates you can left join from your new data to the Transaction ID table and identify new/existing rows that way.
4. After you've done your new inserts, add the new Transaction IDs to the local Transaction ID table.

Tim
0
 

Author Closing Comment

by:jknj72
Comment Utility
Thanks
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now