Marcus Aurelius
asked on
SSIS how to COMPARE a data column from different servers?
Experts,
I have a SSIS requirement to VERIFY MemberIDs from one Server with MemberIDs on another Server. I need to create a matched list and an unmatched list.
I need to compare data from these 2 tables like this:
select memberID from server1.dbo.table1 t1
select memberID from server2.dbo.table2 t2
Compare SQL:
select t2.memberID from server2.dbo.table2 t2
where t2.memberID in
(select t1.memberID from server1.dbo.table1 t1)
---------
I believe I'll need to use a VARIABLE to capture the memberIDs from Table1 and use this variable in the compare SQL in order to do the comparison.
Is this the best way or are there better ways to handle this within SSIS?
Thanks
M
I have a SSIS requirement to VERIFY MemberIDs from one Server with MemberIDs on another Server. I need to create a matched list and an unmatched list.
I need to compare data from these 2 tables like this:
select memberID from server1.dbo.table1 t1
select memberID from server2.dbo.table2 t2
Compare SQL:
select t2.memberID from server2.dbo.table2 t2
where t2.memberID in
(select t1.memberID from server1.dbo.table1 t1)
---------
I believe I'll need to use a VARIABLE to capture the memberIDs from Table1 and use this variable in the compare SQL in order to do the comparison.
Is this the best way or are there better ways to handle this within SSIS?
Thanks
M
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
How are you checking whether data has been cached or not? Can you post a screenshot of how you have configured the LookUp transformation task (especially, the match keys)?
ASKER
I now have the missing data corrected,..it was an ACCESS ISSUE!
HOWEVER, I still need assistance:
Can I place LOGIC within the LOOKUP?
I need to match on the MEMBERID from table1 but also need to compare a DATE field from Server1.Table1 to 2 Date fields in Server2.Table1.
I need to compare like this:
Server1.Table1.MemberID = Server2.Table1.MemberID
AND
Server1.Table1.EncounterDa te between Server2.Table1.StartDate and Server2.Table1.EndDate
If above is correct the this is a MATCH else NOT MATCHED
What is best way to accomplish this logic within SSIS..??
HOWEVER, I still need assistance:
Can I place LOGIC within the LOOKUP?
I need to match on the MEMBERID from table1 but also need to compare a DATE field from Server1.Table1 to 2 Date fields in Server2.Table1.
I need to compare like this:
Server1.Table1.MemberID = Server2.Table1.MemberID
AND
Server1.Table1.EncounterDa
If above is correct the this is a MATCH else NOT MATCHED
What is best way to accomplish this logic within SSIS..??
I am evaluating how to use a LookUp tranform for range validations. Typically, LookUps are for getting the values of an enumeration or quickly validating whether an Id exists in a cached source or not.
My knee-jerk reaction would be to use the fastest way, which in this case would be the option #3 I proposed earlier. In this option, you will need to get the data from both sources into temporary tables and then use a simple T-SQL LEFT OUTER JOIN. If you get a not-null value in the columns for Server2.Table1, it's a match else it's not.
If you do not want to use T-SQL, the other (purely SSIS) way is option #2 (using sort and Merge operators). The bottleneck in this is the sort operator which will stop the data pipeline untill it gets all records from the source so that it can sort the data. Hence, this method can work well, but only when you have sorted inputs. If you do not have sorted inputs, this actually may not perform as expected.
My knee-jerk reaction would be to use the fastest way, which in this case would be the option #3 I proposed earlier. In this option, you will need to get the data from both sources into temporary tables and then use a simple T-SQL LEFT OUTER JOIN. If you get a not-null value in the columns for Server2.Table1, it's a match else it's not.
If you do not want to use T-SQL, the other (purely SSIS) way is option #2 (using sort and Merge operators). The bottleneck in this is the sort operator which will stop the data pipeline untill it gets all records from the source so that it can sort the data. Hence, this method can work well, but only when you have sorted inputs. If you do not have sorted inputs, this actually may not perform as expected.
Here's how to achieve the requirement using the Look Up transformation:
Option 01:
1. Perform the Look up only on MemberId - Those not matching on the MemberId can straight-away go to the no-match output whereas those matching on the MemberId should "go down" the data pipeline
2. Connect a "Derived Column" transformation to the LookUp Match Output
3. In the Derived Column, add two columns:
a. Column - StartDateValidation: Server1.Table1.EncounterDa te >= Server2.Table1.StartDate
b. Column - EndDateValidation: Server1.Table1.EncounterDa te <= Server2.Table1.EndDate
4. Connect a Conditional Split to the Derived Column transformation
a. Condition to check: StartDateValidation == TRUE && EndDateValidation == TRUE
(You can name this output as "Match Output")
5. The "Match Output" from the Conditional Split can now go to your subsequent destination for matched records
6. The conditional split default output can go to your "No-match" output logic
Option 02 (Not recommended due to extremely poor performance):
1. You can use the "Partial Cache" or "No Cache" modes of the LookUp transformation.
2. Map both the MemberId to the LookUp source and EncounterDate to the StartDate from the lookup source
3. In the "Advanced" tab of the LookUp transform, a query can be used as under:
SELECT *
FROM (SELECT MemberId,
StartDate,
EndDate
FROM Server2.Table1) AS [refTable]
WHERE [refTable].MemberId = ?
AND ? BETWEEN [refTable].StartDate AND [refTable].EndDate
4. Click on the Parameters button and map Parameter0 to the MemberId from the input and Parameter1 to the EncounterDate
5. Continue with processing of match & no-match output as required
The drawback here is that because we need to use Partial Cache or No Cache modes, a lot of round-trips to the lookup cache are required and a large number of buffers are also used, degrading performance terribly (I could visually see a huge performance hit in option #2 over option #1 with a data-set as small as 121,000 rows and a lookup of 500 rows).
Option 01:
1. Perform the Look up only on MemberId - Those not matching on the MemberId can straight-away go to the no-match output whereas those matching on the MemberId should "go down" the data pipeline
2. Connect a "Derived Column" transformation to the LookUp Match Output
3. In the Derived Column, add two columns:
a. Column - StartDateValidation: Server1.Table1.EncounterDa
b. Column - EndDateValidation: Server1.Table1.EncounterDa
4. Connect a Conditional Split to the Derived Column transformation
a. Condition to check: StartDateValidation == TRUE && EndDateValidation == TRUE
(You can name this output as "Match Output")
5. The "Match Output" from the Conditional Split can now go to your subsequent destination for matched records
6. The conditional split default output can go to your "No-match" output logic
Option 02 (Not recommended due to extremely poor performance):
1. You can use the "Partial Cache" or "No Cache" modes of the LookUp transformation.
2. Map both the MemberId to the LookUp source and EncounterDate to the StartDate from the lookup source
3. In the "Advanced" tab of the LookUp transform, a query can be used as under:
SELECT *
FROM (SELECT MemberId,
StartDate,
EndDate
FROM Server2.Table1) AS [refTable]
WHERE [refTable].MemberId = ?
AND ? BETWEEN [refTable].StartDate AND [refTable].EndDate
4. Click on the Parameters button and map Parameter0 to the MemberId from the input and Parameter1 to the EncounterDate
5. Continue with processing of match & no-match output as required
The drawback here is that because we need to use Partial Cache or No Cache modes, a lot of round-trips to the lookup cache are required and a large number of buffers are also used, degrading performance terribly (I could visually see a huge performance hit in option #2 over option #1 with a data-set as small as 121,000 rows and a lookup of 500 rows).
ASKER
What could be the issue?
When I run my simple query in SSMS it pulls in 800K rows of data. However when I try to use this same query in the LOOKUP transform I don't get any data and so all records show as being NOT MATCHED...which I know is not true.
(Note: I even tried using the actual TABLE source to pull in the lookup data, still no data populated in cache...???)