asked on

Use Trigger To Avoid Duplicates?

I have encountered a situation that causes duplicate records - which, in turn, causes difficulties in billing. So I decided to alter/change the AccountData table with UNIQUE constraints to repel the duplicate records. That solution worked fine until the Error Log filled up.

So I got wind of an idea: someone mentioned using a TRIGGER (FOR INSERT, I beleive) to "flag those duplicate records as duplicates". But it turns out to be easier said than done.

Now I'm at an impasse: I have no idea how to get the trigger to check for duplicates, and flag the incoming record as duplicate when it turns out to BE a duplicate. I figure overloading a column will suffice (to detect the dupes at billing time), but avoiding self-reference has been a real BEAR.

Does anybody have any idea how I could tackle this beast?

Thank you (in advance),

-LongFist

danblake

Why not increase the error log size ?

Having a look here:
http://www.sql-server-performance.com/rd_delete_duplicates.asp

Also:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vdbref/html/dvrefindexkeyspropertypage.asp

By creating an index, you can delete the duplicate rows with the option:
Ignore duplicate rows Deletes duplicate rows from the index as it is being created

You can copy the data to a temp table, with your input/original data source for checking across you range create an index, automatically delete the rows- -- hey presto -- nice quick solution. (Generally the index creation to delete dups is the fastest mechanism to remove dups in a billing run -- I've seen this done at many sites this way)

Trigger would be a bad way -- uses too much internal memory/poor for large record sets.

Gene Moody

ASKER

The situation is actually less transparent than it appears: the recordsets aren't too large, and only seven attributes make the entry duplicate or not. (The rest of that data in the record may drift; it's difficult to say...) In preparation to make things run even faster, the table is (or will soon be) carved back to a single 30-day group of records (less than 2,000) rather than the entire last two years (over 1,000,000!) or records.

Then we have to add another wrinkle: the accounting data is extracted via a view: the Index feature is surely the least expensive method - and by far the most efficient - but the idea behind this procedure is to make it totally self-sufficient, even though it tends to be a little more expensive. (Beleive me, I had already thought of that. It was the simpler solution, but it requires repeat visits on site, thus causing the Customer to beleive that he's being taken advantage of...) So coming in to the office to perform a two-phase "eradication" of duplicate records on a regular basis might seem less efficient to the Customer. (But hey, I'm okay with the additional income...)

So it comes down to the need for a trigger to intercept a record during INSERT, determine (by checking the five-to-seven columns) if it *is* a duplicate (regardless of what the additional columns contain), and <somehow> mark the record as a duplicate, probably by overloading a column, otherwise leaving it alone, and letting it INSERT as it was.

But how to approach it? I almost always get burnt because I'm too ignorant to know how to reference the INSERT data vs. the existing data. And - of course - there are almost no (known) notes to help me figure this out.

Thanks for your information, though - you're incredibly sharp! And I really appreciate that.

Hope this message finds you well,

-LongFist

Gene Moody

ASKER

Waitaseond! When I read:

>>Also:
>>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vdbref/html/dvrefindexkeyspropertypage.asp
>>
>>By creating an index, you can delete the duplicate rows with the option:
>>Ignore duplicate rows Deletes duplicate rows from the index as it is being created
>>
>>You can copy the data to a temp table, with your input/original data source for
>>checking across you >>range create an index, automatically delete the rows-
>>-- hey presto -- nice quick solution.
>>(Generally the index creation to delete dups is the fastest mechanism to remove
>>dups in a billing run -- I've seen this done at many sites this way)

I thought you meant perform the rollover of original data to/from the Temp dta each time - but as I read the second article, I began to SEE THE LIGHT.

Shoot, you may have answered this perfectly. Let me try it on site, and see if the monster gronks. When the twonky stops showing duplicate records, you get the points. (And well-earned points they are, too! Thank you!!!)

Thanks again --- sorry I misread it at first, but I was scanning more than reading to determine if I had a solution or not - you'd think I'd pay more attention, right?

Again, I hope this message finds you well,

-LongFist

danblake

I could have made is slightly clearer...

To test...
You can copy the data to a temp table, with your input/original data source for checking across your range create an index, automatically delete the rows- -- hey presto -- nice quick solution. (Generally the index creation to delete dups is the fastest mechanism to remove dups in a billing run -- I've seen this done at many sites this way)

Sometimes, the final index (duplicate index -- check) if across many rows is faster to use within a temporary table, offsetting the working space to tempdb (a partial check of the entire data-set because you are copying only a subset of the records) and you do not want the duplicate check index on the final RDBMS table.

Gene Moody

ASKER

Okay - I'll admit it - I'm still new to all the new techniques we have available to us since SQL Server 6.0 (and since my employers generally love my code, they almost never let me play with the SQL Server for any length of time...) - so i'm not sure how I'd spread the dupe-check functionality across to the TempDB. Or is that what you are saying?

I'm sorry if I strike you as being an ignorant jerk - I admit that I *am* ignorant. But sometimes the complexity of the simplest tasks escapes me. Mind you, in VB/VC++ and .Net, the world spins a lot differently, and I'm so much more at home there.

Thanks again - I'm eager to apply what I've learned on the site: I think you've really hit on the solution!

Did I mention Thank You? For your time and trouble, that is...

-LongFist

danblake

No worries, looking for a new challenge...

ASKER CERTIFIED SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

danblake

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial