Solved

SSIS how to COMPARE a data column from different servers?

Posted on 2016-09-22
6
78 Views
Last Modified: 2016-10-25
Experts,

I have a SSIS requirement to VERIFY MemberIDs from one Server with MemberIDs on another Server. I need to create a matched list and an unmatched list.

I need to compare data from these 2 tables like this:

select memberID from server1.dbo.table1 t1

select memberID from server2.dbo.table2 t2

Compare SQL:

select t2.memberID from server2.dbo.table2 t2
where t2.memberID in
(select t1.memberID from server1.dbo.table1 t1)

---------

I believe I'll need to use a VARIABLE to capture the memberIDs from Table1 and use this variable in the compare SQL in order to do the comparison.

Is this the best way or are there better ways to handle this within SSIS?

Thanks
M
0
Comment
Question by:MIKE
  • 4
  • 2
6 Comments
 
LVL 11

Accepted Solution

by:
Nakul Vachhrajani earned 500 total points
ID: 41811976
Assuming that one is a "master" list of sorts (something that you would treat as your base data), there are a couple of alternatives:
1. Use the Lookup transformation. The lookup transformation will take MemberIds from one server as input whereas the master data-set (the data on another server) will be your lookup dataset. The Lookup transformation will provide you with a "Match Output" and a "No-match Output" separately, so no further processing would be required on your end
2. You can also use a MERGE transform with a full outer join to see which records are available in one list v/s the other. The missing records would have a NULL in the respective list which then needs to be separated out using a "Conditional Split"
3. The third option is a traditional T-SQL solution. You download both data sets into temporary tables and then use T-SQL to separate out the mismatched records

If you have a small amount of source data (i.e. data on one server is a master whereas the data in the other server is transactional information), my vote would be to go with option #1 (lookup transformation).
However, if you have large data on either side (e.g. merging transactional data), then the 3rd option (traditional T-SQL) will yield the best performance.
1
 
LVL 17

Author Comment

by:MIKE
ID: 41815162
I'm attempting to use LOOKUP transform. But the CACHE is never populated with lookup data??

What could be the issue?

When I run my simple query in SSMS it pulls in 800K rows of data. However when I try to use this same query in the LOOKUP transform I don't get any data and so all records show as being NOT MATCHED...which I know is not true.

(Note: I even tried using the actual TABLE source to pull in the lookup data, still no data populated in cache...???)
0
 
LVL 11

Expert Comment

by:Nakul Vachhrajani
ID: 41824758
How are you checking whether data has been cached or not? Can you post a screenshot of how you have configured the LookUp transformation task (especially, the match keys)?
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 17

Author Comment

by:MIKE
ID: 41837783
I now have the missing data corrected,..it was an ACCESS ISSUE!

HOWEVER, I still need assistance:

Can I place LOGIC within the LOOKUP?

I need to match on the MEMBERID from table1 but also need to compare a DATE field from Server1.Table1 to 2 Date fields in Server2.Table1.

I need to compare like this:

Server1.Table1.MemberID = Server2.Table1.MemberID
AND
Server1.Table1.EncounterDate between Server2.Table1.StartDate and Server2.Table1.EndDate

If above is correct the this is a MATCH else NOT MATCHED

What is best way to accomplish this logic within SSIS..??
0
 
LVL 11

Expert Comment

by:Nakul Vachhrajani
ID: 41838251
I am evaluating how to use a LookUp tranform for range validations. Typically, LookUps are for getting the values of an enumeration or quickly validating whether an Id exists in a cached source or not.

My knee-jerk reaction would be to use the fastest way, which in this case would be the option #3 I proposed earlier. In this option, you will need to get the data from both sources into temporary tables and then use a simple T-SQL LEFT OUTER JOIN. If you get a not-null value in the columns for Server2.Table1, it's a match else it's not.

If you do not want to use T-SQL, the other (purely SSIS) way is option #2 (using sort and Merge operators). The bottleneck in this is the sort operator which will stop the data pipeline untill it gets all records from the source so that it can sort the data. Hence, this method can work well, but only when you have sorted inputs. If you do not have sorted inputs, this actually may not perform as expected.
0
 
LVL 11

Expert Comment

by:Nakul Vachhrajani
ID: 41838346
Here's how to achieve the requirement using the Look Up transformation:

Option 01:
1. Perform the Look up only on MemberId - Those not matching on the MemberId can straight-away go to the no-match output whereas those matching on the MemberId should "go down" the data pipeline
2. Connect a "Derived Column" transformation to the LookUp Match Output
3. In the Derived Column, add two columns:
    a. Column - StartDateValidation: Server1.Table1.EncounterDate >= Server2.Table1.StartDate
    b. Column - EndDateValidation: Server1.Table1.EncounterDate <= Server2.Table1.EndDate
4. Connect a Conditional Split to the Derived Column transformation
    a. Condition to check: StartDateValidation == TRUE && EndDateValidation == TRUE
    (You can name this output as "Match Output")
5. The "Match Output" from the Conditional Split can now go to your subsequent destination for matched records
6. The conditional split default output can go to your "No-match" output logic

Option 02 (Not recommended due to extremely poor performance):
1. You can use the "Partial Cache" or "No Cache" modes of the LookUp transformation.
2. Map both the MemberId to the LookUp source and EncounterDate to the StartDate from the lookup source
3. In the "Advanced" tab of the LookUp transform, a query can be used as under:
    SELECT *
    FROM (SELECT MemberId,
                               StartDate,
                               EndDate
                FROM Server2.Table1) AS [refTable]
                WHERE [refTable].MemberId = ?
                       AND ? BETWEEN [refTable].StartDate AND [refTable].EndDate
4. Click on the Parameters button and map Parameter0 to the MemberId from the input and Parameter1 to the EncounterDate
5. Continue with processing of match & no-match output as required

The drawback here is that because we need to use Partial Cache or No Cache modes, a lot of round-trips to the lookup cache are required and a large number of buffers are also used, degrading performance terribly (I could visually see a huge performance hit in option #2 over option #1 with a data-set as small as 121,000 rows and a lookup of 500 rows).
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now