?
Solved

SSIS how to COMPARE a data column from different servers?

Posted on 2016-09-22
6
Medium Priority
?
285 Views
Last Modified: 2016-10-25
Experts,

I have a SSIS requirement to VERIFY MemberIDs from one Server with MemberIDs on another Server. I need to create a matched list and an unmatched list.

I need to compare data from these 2 tables like this:

select memberID from server1.dbo.table1 t1

select memberID from server2.dbo.table2 t2

Compare SQL:

select t2.memberID from server2.dbo.table2 t2
where t2.memberID in
(select t1.memberID from server1.dbo.table1 t1)

---------

I believe I'll need to use a VARIABLE to capture the memberIDs from Table1 and use this variable in the compare SQL in order to do the comparison.

Is this the best way or are there better ways to handle this within SSIS?

Thanks
M
0
Comment
Question by:MIKE
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 14

Accepted Solution

by:
Nakul Vachhrajani earned 2000 total points
ID: 41811976
Assuming that one is a "master" list of sorts (something that you would treat as your base data), there are a couple of alternatives:
1. Use the Lookup transformation. The lookup transformation will take MemberIds from one server as input whereas the master data-set (the data on another server) will be your lookup dataset. The Lookup transformation will provide you with a "Match Output" and a "No-match Output" separately, so no further processing would be required on your end
2. You can also use a MERGE transform with a full outer join to see which records are available in one list v/s the other. The missing records would have a NULL in the respective list which then needs to be separated out using a "Conditional Split"
3. The third option is a traditional T-SQL solution. You download both data sets into temporary tables and then use T-SQL to separate out the mismatched records

If you have a small amount of source data (i.e. data on one server is a master whereas the data in the other server is transactional information), my vote would be to go with option #1 (lookup transformation).
However, if you have large data on either side (e.g. merging transactional data), then the 3rd option (traditional T-SQL) will yield the best performance.
1
 
LVL 17

Author Comment

by:MIKE
ID: 41815162
I'm attempting to use LOOKUP transform. But the CACHE is never populated with lookup data??

What could be the issue?

When I run my simple query in SSMS it pulls in 800K rows of data. However when I try to use this same query in the LOOKUP transform I don't get any data and so all records show as being NOT MATCHED...which I know is not true.

(Note: I even tried using the actual TABLE source to pull in the lookup data, still no data populated in cache...???)
0
 
LVL 14

Expert Comment

by:Nakul Vachhrajani
ID: 41824758
How are you checking whether data has been cached or not? Can you post a screenshot of how you have configured the LookUp transformation task (especially, the match keys)?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 17

Author Comment

by:MIKE
ID: 41837783
I now have the missing data corrected,..it was an ACCESS ISSUE!

HOWEVER, I still need assistance:

Can I place LOGIC within the LOOKUP?

I need to match on the MEMBERID from table1 but also need to compare a DATE field from Server1.Table1 to 2 Date fields in Server2.Table1.

I need to compare like this:

Server1.Table1.MemberID = Server2.Table1.MemberID
AND
Server1.Table1.EncounterDate between Server2.Table1.StartDate and Server2.Table1.EndDate

If above is correct the this is a MATCH else NOT MATCHED

What is best way to accomplish this logic within SSIS..??
0
 
LVL 14

Expert Comment

by:Nakul Vachhrajani
ID: 41838251
I am evaluating how to use a LookUp tranform for range validations. Typically, LookUps are for getting the values of an enumeration or quickly validating whether an Id exists in a cached source or not.

My knee-jerk reaction would be to use the fastest way, which in this case would be the option #3 I proposed earlier. In this option, you will need to get the data from both sources into temporary tables and then use a simple T-SQL LEFT OUTER JOIN. If you get a not-null value in the columns for Server2.Table1, it's a match else it's not.

If you do not want to use T-SQL, the other (purely SSIS) way is option #2 (using sort and Merge operators). The bottleneck in this is the sort operator which will stop the data pipeline untill it gets all records from the source so that it can sort the data. Hence, this method can work well, but only when you have sorted inputs. If you do not have sorted inputs, this actually may not perform as expected.
0
 
LVL 14

Expert Comment

by:Nakul Vachhrajani
ID: 41838346
Here's how to achieve the requirement using the Look Up transformation:

Option 01:
1. Perform the Look up only on MemberId - Those not matching on the MemberId can straight-away go to the no-match output whereas those matching on the MemberId should "go down" the data pipeline
2. Connect a "Derived Column" transformation to the LookUp Match Output
3. In the Derived Column, add two columns:
    a. Column - StartDateValidation: Server1.Table1.EncounterDate >= Server2.Table1.StartDate
    b. Column - EndDateValidation: Server1.Table1.EncounterDate <= Server2.Table1.EndDate
4. Connect a Conditional Split to the Derived Column transformation
    a. Condition to check: StartDateValidation == TRUE && EndDateValidation == TRUE
    (You can name this output as "Match Output")
5. The "Match Output" from the Conditional Split can now go to your subsequent destination for matched records
6. The conditional split default output can go to your "No-match" output logic

Option 02 (Not recommended due to extremely poor performance):
1. You can use the "Partial Cache" or "No Cache" modes of the LookUp transformation.
2. Map both the MemberId to the LookUp source and EncounterDate to the StartDate from the lookup source
3. In the "Advanced" tab of the LookUp transform, a query can be used as under:
    SELECT *
    FROM (SELECT MemberId,
                               StartDate,
                               EndDate
                FROM Server2.Table1) AS [refTable]
                WHERE [refTable].MemberId = ?
                       AND ? BETWEEN [refTable].StartDate AND [refTable].EndDate
4. Click on the Parameters button and map Parameter0 to the MemberId from the input and Parameter1 to the EncounterDate
5. Continue with processing of match & no-match output as required

The drawback here is that because we need to use Partial Cache or No Cache modes, a lot of round-trips to the lookup cache are required and a large number of buffers are also used, degrading performance terribly (I could visually see a huge performance hit in option #2 over option #1 with a data-set as small as 121,000 rows and a lookup of 500 rows).
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question