Time to merge 42K records into 300M records recently increased significantly. Alternatives?

SQL Server 2008 R2

Migrating data from a production database into a reporting database every morning.  Because of the way the data is collected and updated in the production database, I must look back as much as 30 days to pull values for a SourceID and ProductionDate into the reporting database.  This 30 day recordset returns about 42000 records, almost instantaneously (this is the "source").

The "target" table contains several decades worth of data (over 300M records) indexed by the same SourceID and ProductionDate.

What I have been doing is merging the "source" data into the reporting database table, updating the records which already exist with data from the "source" and appending records which don't already exist.  I assumed this would be quicker than performing a delete operation and an insert, but lately the time it takes to run this simple merge operation has grown from about 40 minutes to over an 1:45:00 (give or take).  And I cannot figure out what caused this huge increase in time.

1.  Would it be quicker to perform a delete and than an append, than the merge
2.  What steps should I take to determine the cause of the increase in time?
3.  Instead of using the entire table in the reporting database as the target, can I somehow use a subset of the table as the target?  Something like:
MERGE SELECT * from tbl_Reports WHERE ProductionDate >= @FromDate as Target

Open in new window

or maybe
MERGE vw_Reports_Table_Last35days as Target

Open in new window

which would use an updateable view as the target?
LVL 50
Dale FyeAsked:
Who is Participating?
 
Koen Van WielinkConnect With a Mentor Business Intelligence SpecialistCommented:
Hi Dale,

The merge function is relatively slow performance wise. If you're not keeping history, you might want to consider using a double Except statement which compares sets instead of records. The idea is as follows:
1. Store your source data in a temp table or CTE (let's call that A)
2. Store your target data in a temp table or CTE (which we'll call B). Columns must be the same as for table A
3. In a third CTE or temp table, store the result of "Select * from A Except Select * From B". Call this AminB.
4. Do the same in reverse: "Select * from B Except Select * from A". Call this BminA.
The table AminB now gives you all the records that exist in your source, but not yet in your target (i.e. the records you need to insert).
The table BminA contains the records that exist in your target, but not in your source (which I assume you wish to delete as you don't keep history).
It's then roughly:

delete from target where exists (select 1 from BminA where target.key = BminA.key)
And
insert into target select * from AminB

Because the Except operator uses the whole set to compare rather than per record it should perform much faster.
0
 
Koen Van WielinkBusiness Intelligence SpecialistCommented:
Hi Dale,

What kind of method are you using to merge the data? Are you using the Merge function in MSSQL server, or are you using separate insert, (logical) delete and update statements? I guess from your question that you are not "stacking" your historical records? For example, when the attributes that belong to a certain SourceID have changed, do you then simply overwrite the data, or do you mark the existing records as deleted and do you insert a new, current record with the same sourceID? Do you have your tables properly indexed so that the records can easily be retrieved?
You have a few different options but it depends on your setup. Hence all the questions :p.
0
 
Dale FyeAuthor Commented:
Yes, I am using the MERGE function.

No, I'm not stacking the historical records, I'm simply overwriting the old values with new ones.

Yes, the tables are indexed on the fields being used in the MERGE statements JOIN
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
Dale FyeAuthor Commented:
Getting an error:
Msg 208, Level 16, State 1, Line ##
Invalid object name 'Recent_Production'

with syntax that looks like:
WITH Recent_Production (FieldList)
AS 
(
SELECT FieldList
FROM ProductionTable
)

DELETE FROM tbl_Reports
WHERE Exists (SELECT 1 FROM Recent_Production  
WHERE Entity_ID = tbl_Reports.Entity_ID AND ProductionDate = tbl_Reports.docDate)

INSERT INTO tbl_Reports
SELECT * FROM Recent_Production

Open in new window

The error is pointing to the INSERT INTO tbl_Reports line

I've also tried:
INSERT INTO tbl_Reports(FieldList)
SELECT FieldList FROM Recent_Production

Open in new window


The CTE and the DELETE processes worked right but the insert process is not working properly.  Seems like it is dropping the CTE.  
I event remarked out the INSERT section and simply tried:
SELECT * FROM Recent_Production

Open in new window

and that gave me the same error
0
 
Dale FyeAuthor Commented:
Figured it out.

CTE's can only be used in one operation, so I replaced the CTE with a temp table.

Appreciate the recommendation as this is significantly faster than the MERGE operation!
0
 
Koen Van WielinkBusiness Intelligence SpecialistCommented:
Hi Dale, great to hear it worked. Apologies for the late reply. Your comment on the CTE being used in 1 operation is indeed correct.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.