Merge SQL Data

I have a SQL server with a linked server (Non-SQL).  The linked server is connected over a WAN and latency can be a problem.  It is not a SQL server, so no triggers etc.  

I have a view that shows data from the linked server.  I want to regularly copy data from the view to a local table.

The view probably has over 15000 records.  

When a record is inserted, updated or deleted in the source (linked server) I need this replicating to the local table.  

Don't worry about the scheduling etc, I am just concerned with the logic to replicate the data.  

Dropping the local table and copying it is not a good answer.  I need to somehow go through the source data and replicate changes rather than copying it in bulk.
LVL 5
JohnAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ste5anSenior DeveloperCommented:
What is your exact problem?

It sounds like you should use SSIS instead of a linked server.

Dropping the local table and copying it is not a good answer.  I need to somehow go through the source data and replicate changes rather than copying it in bulk.
As latency is your problem: Can you trust the data source? Does a update discriminator like a last changed timestamp exists? Then you can offload this part of the problem to the data source. Otherwise not.

It's a common problem in ETL processes, that you need to replicate the entire data set for initial change detection, when you work with untrusted/unreliable data sources.
0
JohnAuthor Commented:
I need a local copy of the data in a the view (that targets the linked server) because the link is slow.  It can take 50+second to return all records from the linked server.  

This is used as a lookup for another application and this speed is unacceptable.  So we decided to have a local copy of the data.  

Data in the local table will never change (except through this merge procedure), it is just being used for lookups.  

We don't have access for SSIS.

I was going to spin up an app in VB to load the source and destination data and compare.  

But I feel this should run much quicker if done within SQL.  

I am not sure how to do this in SQL for best performance.  

Perhaps 3 queries:

1.  Insert (from the source) into the destination if row does not exist in the destination,

2.  for each row in source, if it exists in destination, but at least one column has different data, update it.  (maybe update it regardless if the update is quicker than testing to see if it has changed)

3.  For each column in destination, if it is not in the source, delete it.
0
ste5anSenior DeveloperCommented:
It's still not clear, what your concrete problem is.

But point 3 is a problem. Is your source stable or not? What kind of table is it? Cause relations should/must be stable.

The basic strategy is

1) You create the a staging table once. No primary keys. But definitely no foreign keys.
2) Then you TRUNCATE the table and load the entire data from your linked server.
3) You run a DELETE and MERGE against your persistent local copy.
0
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

JohnAuthor Commented:
I don't understand what you mean by stable.  

It is a view that shows some data from tables from a foreign database server.  

I need a local copy.  

Truncate and reload is not an option.  the data in the local copy needs to be present at all times, I can't have a table emptied in order that it is filled afterwards.  

So, we create a table with the same schema as the view that we want to replicate.  

I need assistance in copying the data from the view to the table.  

The data in the view changes from time to time.  I need those changes to be replicated in the local table.  

So the data in the view gets modified by another process in another city.  

Any rows removed from the view need removing from the local table.  

Any records added to the view need adding in the local table.

Any records modified in the view need modifying in the local table.  

So as I said before, I think I need three sql statements.  

1.  Insert any new records into the local table
2.  delete any records from the local table that are not in the view
3.  Compare records that appear both in the view and the local table and if they are different, update them in the local table

bearing in mind that there may be around 15000 records, I need an efficient way to do each of these actions.
0
ste5anSenior DeveloperCommented:
Truncate and reload is not an option.  the data in the local copy needs to be present at all times, I can't have a table emptied in order that it is filled afterwards.
It is the only option as far as I understand you.

Read my last post again: I said you truncate the staging table.

A staging table is similar or - when possible - identical to the source table structure. Only without any constraint. Thus you can truncate it and load the entire source data.

Then you can determine in a single DELETE statement what rows needs to be dropped. And you can use MERGE or INSERT and UPDATE to adjust your "local copy table" for new and changed rows. The "local copy table" is not the staging table. It is the table holding the necessary lookup data in your data model.
1
ste5anSenior DeveloperCommented:
And I mean by stable:

We are talking about creating an ETL (extract, transform, load) process. When the data source is not stable, thus changes without prior notice, this means any automatism can ultimately fail.

Especially as you mentioned disappearing columns. Here you need to tell us, why this can happen. In normal scenarios you can handle only a limited set of scenarios here without manual intervention.
0
JohnAuthor Commented:
Sorry, I mis-read it.  

I create a staging table, truncate it.  

Then I load all the records from the source into the staging table.  

How would I do the merge, insert or update?
0
JohnAuthor Commented:
I don't think stability is an issue here.  

I am going to run this every 10 minutes or so.  

As per your last post, we load a staging table with the source data.  If the source data subsequently changes after it is staged, I don't care.  those extra changes will be captured the next time this runs.
0
ste5anSenior DeveloperCommented:
Can rows be deleted? Then you would do a DELETE. Then you can either use MERGE or INSERT and UPDATE to adjust your tables.

The concrete statements depend on your data model. You need to post it for concrete help.
0
JohnAuthor Commented:
Sorry, I wasn't clear enough earlier.  Columns will no disappear

I guess you got that from:

3.  For each column in destination, if it is not in the source, delete it.

To make it clearer:

3.  Check each column in destination, if any have different data to the source, delete the row
0
JohnAuthor Commented:
I can do an update, insert, delete etc.  I don't know how to do them conditionally in this case.  

If a row (eg select ID, a,b,c from view) is not present in the destination, then insert it.  

if select ID, a,b,c from destination has a different value for a, b or c, then update it instead.  

If a row (select ID, a,b,c from destination) does not exist in the source, delete it

I can do these actions as a single entity and I could loop through every record in VB and do this for them, but I think it will be slow.  

I need some help putting this stuff into SQL.  

In SQL, we don't use FOR loops.  I don't know how to do this kind of logic without them.  I think it is referred to as set-based.  This is where I need the help.  

Discussing the database is irrelevant because I will do this on different systems with different schema, I just need to know how to structure the SQL

for a simple example and then I can run with it.  

thanks for your help so far.
0
ste5anSenior DeveloperCommented:
E.g.

-- Your lookup table in your data model
DECLARE @Destination TABLE
    (
        ID INT PRIMARY KEY ,
        D INT ,
        E INT
    );

-- The staging table.
DECLARE @Staging TABLE
    (
        A INT ,
        B INT ,
        C INT
    );

-- Background: A maps to ID and is a candidate key.

-- Simulate 1st load.
INSERT INTO @Staging
VALUES ( 1, 1, 1 ) ,
       ( 2, 2, 2 ) ,
       ( 3, 3, 3 );

-- Remove dropped rows (when possible, error handling or froeign key checks necessary).
DELETE Dst
FROM   @Destination Dst
WHERE  NOT EXISTS (   SELECT *
                      FROM   @Staging Src
                      WHERE  Src.A = Dst.ID );
/* 
-- When you have FKs
DELETE Dst FROM @Destination Dst 
WHERE NOT EXISTS (SELECT * FROM @Staging Src WHERE Src.A = Dst.ID)
AND  NOT EXISTS (SELECT * FROM referenceingTableFK WHERE FK.referencedID = Dst.ID);
*/

-- Update changed rows.
-- Caveat: When NULL's are possible, then you need an further NULL check in the WHERE clause (tri-state logic).
UPDATE Dst
SET    Dst.D = Src.B ,
       Dst.E = Src.C
FROM   @Destination Dst
       INNER JOIN @Staging Src ON Dst.ID = Src.A
WHERE  Dst.D != Src.B
       AND Dst.E != Src.C;

-- Insert new rows.
INSERT INTO @Destination ( ID ,
                           D ,
                           E )
            SELECT Src.A ,
                   Src.B ,
                   Src.C
            FROM   @Staging Src
            WHERE  NOT EXISTS (   SELECT *
                                  FROM   @Destination Dst
                                  WHERE  Dst.ID = Src.A );

-- Result of 1st load.
SELECT *
FROM   @Destination D;

-- Simulate 2st load.
-- You use TRUNCATE on real tables.
DELETE FROM @Staging;

INSERT INTO @Staging
VALUES ( 2, 22, 23 ) ,
       ( 3, 3, 3 );

-- Remove dropped rows (when possible, error handling or froeign key checks necessary).
DELETE Dst
FROM   @Destination Dst
WHERE  NOT EXISTS (   SELECT *
                      FROM   @Staging Src
                      WHERE  Src.A = Dst.ID );

-- Update changed rows.
UPDATE Dst
SET    Dst.D = Src.B ,
       Dst.E = Src.C
FROM   @Destination Dst
       INNER JOIN @Staging Src ON Dst.ID = Src.A
WHERE  Dst.D != Src.B
       AND Dst.E != Src.C;

-- Insert new rows.
INSERT INTO @Destination ( ID ,
                           D ,
                           E )
            SELECT Src.A ,
                   Src.B ,
                   Src.C
            FROM   @Staging Src
            WHERE  NOT EXISTS (   SELECT *
                                  FROM   @Destination Dst
                                  WHERE  Dst.ID = Src.A );

SELECT *
FROM   @Destination D;

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
JohnAuthor Commented:
Wow, that's a lot of hard work.  thanks.  

I'll give this a go and post back.  

thanks again

John
0
JohnAuthor Commented:
Implementing this today...
0
JohnAuthor Commented:
Thanks for your help.  I got there in the end.  your input was really useful.  I was trying to play with joins and they were just slow.  

There was a hiccup in that one of the tables didn't have a unique column, instead, I had to worry about entire rows being unique, so I handled it this way:

DELETE
FROM TableT
WHERE  NOT EXISTS (
       SELECT Col1, COL2
       FROM ViewV
       WHERE ViewV.Col1 = TableT.Col1
         AND ViewV.COL2 = TableT.COL2)



INSERT INTO TableT 
select *
FROM   ViewV
WHERE  NOT EXISTS (   SELECT Col1, COL2 
                      FROM   TableT
                      WHERE  ViewV.Col1 = TableT.Col1 
					  AND ViewV.COL2 = TableT.COL2
					  )  

Open in new window


The upside is that there was no MODIFY code, simply delete what isn't in the source View and then copy over what isn't in the destination table.  

thanks again
0
ste5anSenior DeveloperCommented:
JOINS require often to build the entire result set, while NOT EXISTS is built to return true for the first hit. Thus it is in many cases faster (only because there are scenarios where a JOIN can get the same execution plan as an EXISTS).
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
SQL

From novice to tech pro — start learning today.