Link to home
Start Free TrialLog in
Avatar of cell-gfx
cell-gfx

asked on

DTS Lookup on primary key doesn't work

I am using a Transform Data Task to copy data from one source to another. In order to make sure I ignore duplicate rows, I am using a lookup on the primary key field of the target table that I use to check if the key already exists. If it does, then I perform a TransformStat_SkipRow as opposed to a TransformStat_OK in my ActiveX transform.

The only problem is, the isempty status of the lookup is not being fired when I come across a duplicate. The query in the lookup uses the primary key as a parameter and returns another field if it finds the target record.

Can anyone suggest either what I might be doing wrong or an alternative?
Avatar of AaronAbend
AaronAbend
Flag of United States of America image

Try selecting 1 instead of the other column...

select 1 from table where pk = value ...

then test the parm you load the 1 into.
Avatar of cell-gfx
cell-gfx

ASKER

The ActiveX script reads as :

if isempty ( DTSLookups("ServExists").Execute( DTSDestination("FACT Service ID") ) ) then
      Main = DTSTransformStat_OK
else
      Main = DTSTransformStat_SkipRow
end if

and this is the query that I am using to test :

SELECT     [At Branch ID]
FROM         [Data Warehouse].dbo.[FACT Services Interim] WITH (nolock)
WHERE     ([FACT Service ID] = ?)

Is this is the best way to test for the result of the lookup?
I have altered my lookup to return 1 instead of my primary key, but this is not working.
If it helps, I have set the connection for the lookup to the target SQL server of the TDT.
When I have done work like this, I load the new data into a temporary table and resolve the duplicates in a stored procedure. I believe it is faster that way, and it is easy to do. Unless most of your data is bad, in which case you save time by not loading the dups. A stored procedure also makes it easier to determine whether to save the existing value with a matching PK or using the new one.

ActiveX in DTS is very hard to debug (I have written many programs just like this one, but have avoided it because it is so hard to work with). The main advantage is the ability to create a "bads" table for reloading (which is important in very large loads where redoing all of the data after a fail is not an option).

That is probably a good idea, but it still doesn't resolve the fact that a piece of DTS functioanlity is just not working properly. I am not worried about the duplicates being ignored or tracked, I just want to skip them. To create stored procedures to do this seems a little excessive.

As an aside, I have also followed the Books Online advice of allocating the lookup result to a variable and then testing this using isempty, but I still get the same result.
I think I have figured out what might be happening : the duplicate keys are all in the same file that I am loading in.
If I load the file and manually remove the duplicates and then re-run the load, the duplicates are ignored.
This would imply that the query in the lookup can't check records in the target database until they are committed.
Therefore, while the file is loading initially and the lookup check for a duplicate, it will never find one.
Does anyone know if this is the case?
Oh, and also, would switching off 'Fast Load' help ?
I have just tested this and it appears I have answered the main part of my own question.
If I deactivate 'Fast Load' in the TDT options, then the lookup can reference records that are created from the current data load. If 'Fast Load' is selected, then only data that has been committed to the database can be read by the lookup.

My only problem with this is that the process I am using to load the data is dealing with massive amount of information on a daily basis, so switching 'Fast Load' off generates a huge overhead in processing time. Is there any way around this?
ASKER CERTIFIED SOLUTION
Avatar of SoftEng007
SoftEng007

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Should this index be applied to the primary key field?
Thanks, that's worked beautifully. The only down side to this is that the TDT now shows as having a failed status and I test for the success/failure of the TDT to tell whether or not to perform the extract again.

Is there any way to stop the TDT failing once it has skipped the duplicate rows?
Alternatively, when a TDT fails completely, e.g. an ODBC connection it is using fails, does it return anything in the RowsInError CustomTask property?
to trap this error and control the return
you need to use a multiphase datapump

see this for help.

http://www.sqldts.com/default.aspx?282
Unfortunately, as I had previously read with multi-phase data pumps, the following is true :

"If you have the data pump set to FAST LOAD then even when you encounter a Primary Key Violation the On Insert Success phase is called. It is only when we end the data pump that the failure occurs. If you want to handle a primary key violation then you need to turn off FAST LOAD and then the correct phases are called so you can handle the error."

I can't turn Fast Load off, due to the large amounts of data I am processing. Is there any way to trap this error without using a multi-phase pump?
sorry,
I don't know of any way around this problem with out turning fast load off.

well.....
maybe just load the primary keys into a temp table and check to see if any exist in the prod table.
if so exclude those from the load???
That was my only other alternative. I've managed to make my primary key unique by adding more fields to it, but I'llbear all these ideas in mind in the future! Many thanks!