Solved

DTS Lookup on primary key doesn't work

Posted on 2004-08-17
17
373 Views
Last Modified: 2012-06-27
I am using a Transform Data Task to copy data from one source to another. In order to make sure I ignore duplicate rows, I am using a lookup on the primary key field of the target table that I use to check if the key already exists. If it does, then I perform a TransformStat_SkipRow as opposed to a TransformStat_OK in my ActiveX transform.

The only problem is, the isempty status of the lookup is not being fired when I come across a duplicate. The query in the lookup uses the primary key as a parameter and returns another field if it finds the target record.

Can anyone suggest either what I might be doing wrong or an alternative?
0
Comment
Question by:cell-gfx
  • 11
  • 4
  • 2
17 Comments
 
LVL 10

Expert Comment

by:AaronAbend
Comment Utility
Try selecting 1 instead of the other column...

select 1 from table where pk = value ...

then test the parm you load the 1 into.
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
The ActiveX script reads as :

if isempty ( DTSLookups("ServExists").Execute( DTSDestination("FACT Service ID") ) ) then
      Main = DTSTransformStat_OK
else
      Main = DTSTransformStat_SkipRow
end if

and this is the query that I am using to test :

SELECT     [At Branch ID]
FROM         [Data Warehouse].dbo.[FACT Services Interim] WITH (nolock)
WHERE     ([FACT Service ID] = ?)

Is this is the best way to test for the result of the lookup?
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
I have altered my lookup to return 1 instead of my primary key, but this is not working.
If it helps, I have set the connection for the lookup to the target SQL server of the TDT.
0
 
LVL 10

Expert Comment

by:AaronAbend
Comment Utility
When I have done work like this, I load the new data into a temporary table and resolve the duplicates in a stored procedure. I believe it is faster that way, and it is easy to do. Unless most of your data is bad, in which case you save time by not loading the dups. A stored procedure also makes it easier to determine whether to save the existing value with a matching PK or using the new one.

ActiveX in DTS is very hard to debug (I have written many programs just like this one, but have avoided it because it is so hard to work with). The main advantage is the ability to create a "bads" table for reloading (which is important in very large loads where redoing all of the data after a fail is not an option).

0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
That is probably a good idea, but it still doesn't resolve the fact that a piece of DTS functioanlity is just not working properly. I am not worried about the duplicates being ignored or tracked, I just want to skip them. To create stored procedures to do this seems a little excessive.

As an aside, I have also followed the Books Online advice of allocating the lookup result to a variable and then testing this using isempty, but I still get the same result.
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
I think I have figured out what might be happening : the duplicate keys are all in the same file that I am loading in.
If I load the file and manually remove the duplicates and then re-run the load, the duplicates are ignored.
This would imply that the query in the lookup can't check records in the target database until they are committed.
Therefore, while the file is loading initially and the lookup check for a duplicate, it will never find one.
Does anyone know if this is the case?
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
Oh, and also, would switching off 'Fast Load' help ?
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
I have just tested this and it appears I have answered the main part of my own question.
If I deactivate 'Fast Load' in the TDT options, then the lookup can reference records that are created from the current data load. If 'Fast Load' is selected, then only data that has been committed to the database can be read by the lookup.

My only problem with this is that the process I am using to load the data is dealing with massive amount of information on a daily basis, so switching 'Fast Load' off generates a huge overhead in processing time. Is there any way around this?
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 9

Accepted Solution

by:
SoftEng007 earned 500 total points
Comment Utility
make an index on the table

CREATE UNIQUE
  INDEX [IX_XXX_Skip] ON [dbo].[table_name_here] ([col_name_here])
WITH
    IGNORE_DUP_KEY
    ,DROP_EXISTING

then you can just load the data and dups will be skipped!
when done remove the index if you wish
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
Should this index be applied to the primary key field?
0
 
LVL 9

Expert Comment

by:SoftEng007
Comment Utility
yes.
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
Thanks, that's worked beautifully. The only down side to this is that the TDT now shows as having a failed status and I test for the success/failure of the TDT to tell whether or not to perform the extract again.

Is there any way to stop the TDT failing once it has skipped the duplicate rows?
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
Alternatively, when a TDT fails completely, e.g. an ODBC connection it is using fails, does it return anything in the RowsInError CustomTask property?
0
 
LVL 9

Expert Comment

by:SoftEng007
Comment Utility
to trap this error and control the return
you need to use a multiphase datapump

see this for help.

http://www.sqldts.com/default.aspx?282
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
Unfortunately, as I had previously read with multi-phase data pumps, the following is true :

"If you have the data pump set to FAST LOAD then even when you encounter a Primary Key Violation the On Insert Success phase is called. It is only when we end the data pump that the failure occurs. If you want to handle a primary key violation then you need to turn off FAST LOAD and then the correct phases are called so you can handle the error."

I can't turn Fast Load off, due to the large amounts of data I am processing. Is there any way to trap this error without using a multi-phase pump?
0
 
LVL 9

Expert Comment

by:SoftEng007
Comment Utility
sorry,
I don't know of any way around this problem with out turning fast load off.

well.....
maybe just load the primary keys into a temp table and check to see if any exist in the prod table.
if so exclude those from the load???
0
 
LVL 1

Author Comment

by:cell-gfx
Comment Utility
That was my only other alternative. I've managed to make my primary key unique by adding more fields to it, but I'llbear all these ideas in mind in the future! Many thanks!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now