Solved

SSIS how to COMPARE a data column from different servers?

Posted on 2016-09-22
6
155 Views
Last Modified: 2016-10-25
Experts,

I have a SSIS requirement to VERIFY MemberIDs from one Server with MemberIDs on another Server. I need to create a matched list and an unmatched list.

I need to compare data from these 2 tables like this:

select memberID from server1.dbo.table1 t1

select memberID from server2.dbo.table2 t2

Compare SQL:

select t2.memberID from server2.dbo.table2 t2
where t2.memberID in
(select t1.memberID from server1.dbo.table1 t1)

---------

I believe I'll need to use a VARIABLE to capture the memberIDs from Table1 and use this variable in the compare SQL in order to do the comparison.

Is this the best way or are there better ways to handle this within SSIS?

Thanks
M
0
Comment
Question by:MIKE
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 13

Accepted Solution

by:
Nakul Vachhrajani earned 500 total points
ID: 41811976
Assuming that one is a "master" list of sorts (something that you would treat as your base data), there are a couple of alternatives:
1. Use the Lookup transformation. The lookup transformation will take MemberIds from one server as input whereas the master data-set (the data on another server) will be your lookup dataset. The Lookup transformation will provide you with a "Match Output" and a "No-match Output" separately, so no further processing would be required on your end
2. You can also use a MERGE transform with a full outer join to see which records are available in one list v/s the other. The missing records would have a NULL in the respective list which then needs to be separated out using a "Conditional Split"
3. The third option is a traditional T-SQL solution. You download both data sets into temporary tables and then use T-SQL to separate out the mismatched records

If you have a small amount of source data (i.e. data on one server is a master whereas the data in the other server is transactional information), my vote would be to go with option #1 (lookup transformation).
However, if you have large data on either side (e.g. merging transactional data), then the 3rd option (traditional T-SQL) will yield the best performance.
1
 
LVL 17

Author Comment

by:MIKE
ID: 41815162
I'm attempting to use LOOKUP transform. But the CACHE is never populated with lookup data??

What could be the issue?

When I run my simple query in SSMS it pulls in 800K rows of data. However when I try to use this same query in the LOOKUP transform I don't get any data and so all records show as being NOT MATCHED...which I know is not true.

(Note: I even tried using the actual TABLE source to pull in the lookup data, still no data populated in cache...???)
0
 
LVL 13

Expert Comment

by:Nakul Vachhrajani
ID: 41824758
How are you checking whether data has been cached or not? Can you post a screenshot of how you have configured the LookUp transformation task (especially, the match keys)?
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 17

Author Comment

by:MIKE
ID: 41837783
I now have the missing data corrected,..it was an ACCESS ISSUE!

HOWEVER, I still need assistance:

Can I place LOGIC within the LOOKUP?

I need to match on the MEMBERID from table1 but also need to compare a DATE field from Server1.Table1 to 2 Date fields in Server2.Table1.

I need to compare like this:

Server1.Table1.MemberID = Server2.Table1.MemberID
AND
Server1.Table1.EncounterDate between Server2.Table1.StartDate and Server2.Table1.EndDate

If above is correct the this is a MATCH else NOT MATCHED

What is best way to accomplish this logic within SSIS..??
0
 
LVL 13

Expert Comment

by:Nakul Vachhrajani
ID: 41838251
I am evaluating how to use a LookUp tranform for range validations. Typically, LookUps are for getting the values of an enumeration or quickly validating whether an Id exists in a cached source or not.

My knee-jerk reaction would be to use the fastest way, which in this case would be the option #3 I proposed earlier. In this option, you will need to get the data from both sources into temporary tables and then use a simple T-SQL LEFT OUTER JOIN. If you get a not-null value in the columns for Server2.Table1, it's a match else it's not.

If you do not want to use T-SQL, the other (purely SSIS) way is option #2 (using sort and Merge operators). The bottleneck in this is the sort operator which will stop the data pipeline untill it gets all records from the source so that it can sort the data. Hence, this method can work well, but only when you have sorted inputs. If you do not have sorted inputs, this actually may not perform as expected.
0
 
LVL 13

Expert Comment

by:Nakul Vachhrajani
ID: 41838346
Here's how to achieve the requirement using the Look Up transformation:

Option 01:
1. Perform the Look up only on MemberId - Those not matching on the MemberId can straight-away go to the no-match output whereas those matching on the MemberId should "go down" the data pipeline
2. Connect a "Derived Column" transformation to the LookUp Match Output
3. In the Derived Column, add two columns:
    a. Column - StartDateValidation: Server1.Table1.EncounterDate >= Server2.Table1.StartDate
    b. Column - EndDateValidation: Server1.Table1.EncounterDate <= Server2.Table1.EndDate
4. Connect a Conditional Split to the Derived Column transformation
    a. Condition to check: StartDateValidation == TRUE && EndDateValidation == TRUE
    (You can name this output as "Match Output")
5. The "Match Output" from the Conditional Split can now go to your subsequent destination for matched records
6. The conditional split default output can go to your "No-match" output logic

Option 02 (Not recommended due to extremely poor performance):
1. You can use the "Partial Cache" or "No Cache" modes of the LookUp transformation.
2. Map both the MemberId to the LookUp source and EncounterDate to the StartDate from the lookup source
3. In the "Advanced" tab of the LookUp transform, a query can be used as under:
    SELECT *
    FROM (SELECT MemberId,
                               StartDate,
                               EndDate
                FROM Server2.Table1) AS [refTable]
                WHERE [refTable].MemberId = ?
                       AND ? BETWEEN [refTable].StartDate AND [refTable].EndDate
4. Click on the Parameters button and map Parameter0 to the MemberId from the input and Parameter1 to the EncounterDate
5. Continue with processing of match & no-match output as required

The drawback here is that because we need to use Partial Cache or No Cache modes, a lot of round-trips to the lookup cache are required and a large number of buffers are also used, degrading performance terribly (I could visually see a huge performance hit in option #2 over option #1 with a data-set as small as 121,000 rows and a lookup of 500 rows).
0

Featured Post

Edgartown IT Case Study

Learn about Edgartown's quest to ensure the safety and security of the entire town's employee and citizen data. Read the case study!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question