asked on

DTS Lookup on primary key doesn't work

I am using a Transform Data Task to copy data from one source to another. In order to make sure I ignore duplicate rows, I am using a lookup on the primary key field of the target table that I use to check if the key already exists. If it does, then I perform a TransformStat_SkipRow as opposed to a TransformStat_OK in my ActiveX transform.

The only problem is, the isempty status of the lookup is not being fired when I come across a duplicate. The query in the lookup uses the primary key as a parameter and returns another field if it finds the target record.

Can anyone suggest either what I might be doing wrong or an alternative?

AaronAbend

Try selecting 1 instead of the other column...

select 1 from table where pk = value ...

then test the parm you load the 1 into.

cell-gfx

ASKER

The ActiveX script reads as :

if isempty ( DTSLookups("ServExists").Execute( DTSDestination("FACT Service ID") ) ) then
Main = DTSTransformStat_OK
else
Main = DTSTransformStat_SkipRow
end if

and this is the query that I am using to test :

SELECT [At Branch ID]
FROM [Data Warehouse].dbo.[FACT Services Interim] WITH (nolock)
WHERE ([FACT Service ID] = ?)

Is this is the best way to test for the result of the lookup?

cell-gfx

ASKER

I have altered my lookup to return 1 instead of my primary key, but this is not working.
If it helps, I have set the connection for the lookup to the target SQL server of the TDT.

AaronAbend

When I have done work like this, I load the new data into a temporary table and resolve the duplicates in a stored procedure. I believe it is faster that way, and it is easy to do. Unless most of your data is bad, in which case you save time by not loading the dups. A stored procedure also makes it easier to determine whether to save the existing value with a matching PK or using the new one.

ActiveX in DTS is very hard to debug (I have written many programs just like this one, but have avoided it because it is so hard to work with). The main advantage is the ability to create a "bads" table for reloading (which is important in very large loads where redoing all of the data after a fail is not an option).

cell-gfx

ASKER

That is probably a good idea, but it still doesn't resolve the fact that a piece of DTS functioanlity is just not working properly. I am not worried about the duplicates being ignored or tracked, I just want to skip them. To create stored procedures to do this seems a little excessive.

As an aside, I have also followed the Books Online advice of allocating the lookup result to a variable and then testing this using isempty, but I still get the same result.

cell-gfx

ASKER

I think I have figured out what might be happening : the duplicate keys are all in the same file that I am loading in.
If I load the file and manually remove the duplicates and then re-run the load, the duplicates are ignored.
This would imply that the query in the lookup can't check records in the target database until they are committed.
Therefore, while the file is loading initially and the lookup check for a duplicate, it will never find one.
Does anyone know if this is the case?

cell-gfx

ASKER

Oh, and also, would switching off 'Fast Load' help ?

cell-gfx

ASKER

I have just tested this and it appears I have answered the main part of my own question.
If I deactivate 'Fast Load' in the TDT options, then the lookup can reference records that are created from the current data load. If 'Fast Load' is selected, then only data that has been committed to the database can be read by the lookup.

My only problem with this is that the process I am using to load the data is dealing with massive amount of information on a daily basis, so switching 'Fast Load' off generates a huge overhead in processing time. Is there any way around this?

ASKER CERTIFIED SOLUTION

SoftEng007

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

cell-gfx

ASKER

Should this index be applied to the primary key field?

SoftEng007

yes.

cell-gfx

ASKER

Thanks, that's worked beautifully. The only down side to this is that the TDT now shows as having a failed status and I test for the success/failure of the TDT to tell whether or not to perform the extract again.

Is there any way to stop the TDT failing once it has skipped the duplicate rows?

cell-gfx

ASKER

Alternatively, when a TDT fails completely, e.g. an ODBC connection it is using fails, does it return anything in the RowsInError CustomTask property?

SoftEng007

to trap this error and control the return
you need to use a multiphase datapump

see this for help.

http://www.sqldts.com/default.aspx?282

cell-gfx

ASKER

Unfortunately, as I had previously read with multi-phase data pumps, the following is true :

"If you have the data pump set to FAST LOAD then even when you encounter a Primary Key Violation the On Insert Success phase is called. It is only when we end the data pump that the failure occurs. If you want to handle a primary key violation then you need to turn off FAST LOAD and then the correct phases are called so you can handle the error."

I can't turn Fast Load off, due to the large amounts of data I am processing. Is there any way to trap this error without using a multi-phase pump?

SoftEng007

sorry,
I don't know of any way around this problem with out turning fast load off.

well.....
maybe just load the primary keys into a temp table and check to see if any exist in the prod table.
if so exclude those from the load???

cell-gfx

ASKER

That was my only other alternative. I've managed to make my primary key unique by adding more fields to it, but I'llbear all these ideas in mind in the future! Many thanks!