Solved

Use Trigger To Avoid Duplicates?

Posted on 2004-03-22
10
483 Views
Last Modified: 2012-08-14
I have encountered a situation that causes duplicate records - which, in turn, causes difficulties in billing.  So I decided to alter/change the AccountData table with UNIQUE constraints to repel the duplicate records.  That solution worked fine until the Error Log filled up.

So I got wind of an idea: someone mentioned using a TRIGGER (FOR INSERT, I beleive) to "flag those duplicate records as duplicates".  But it turns out to be easier said than done.

Now I'm at an impasse: I have no idea how to get the trigger to check for duplicates, and flag the incoming record as duplicate when it turns out to BE a duplicate.  I figure overloading a column will suffice (to detect the dupes at billing time), but avoiding self-reference has been a real BEAR.

Does anybody have any idea how I could tackle this beast?

Thank you (in advance),

-LongFist
0
Comment
Question by:LongFist
  • 4
  • 3
10 Comments
 
LVL 13

Expert Comment

by:danblake
ID: 10651326
Why not increase the error log size ?

Having a look here:
http://www.sql-server-performance.com/rd_delete_duplicates.asp

Also:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vdbref/html/dvrefindexkeyspropertypage.asp

By creating an index, you can delete the duplicate rows with the option:
Ignore duplicate rows   Deletes duplicate rows from the index as it is being created

You can copy the data to a temp table, with your input/original data source for checking across you range create an index, automatically delete the rows- -- hey presto -- nice quick solution. (Generally the index creation to delete dups is the fastest mechanism to remove dups in a billing run -- I've seen this done at many sites this way)

Trigger would be a bad way -- uses too much internal memory/poor for large record sets.
0
 
LVL 1

Author Comment

by:LongFist
ID: 10652380
The situation is actually less transparent than it appears: the recordsets aren't too large, and only seven attributes make the entry duplicate or not.  (The rest of that data in the record may drift; it's difficult to say...)  In preparation to make things run even faster, the table is (or will soon be) carved back to a single 30-day group of records (less than 2,000) rather than the entire last two years (over 1,000,000!) or records.

Then we have to add another wrinkle: the accounting data is extracted via a view: the Index feature is surely the least expensive method - and by far the most efficient - but the idea behind this procedure is to make it totally self-sufficient, even though it tends to be a little more expensive.  (Beleive me, I had already thought of that.  It was the simpler solution, but it requires repeat visits on site, thus causing the Customer to beleive that he's being taken advantage of...)  So coming in to the office to perform a two-phase "eradication" of duplicate records on a regular basis might seem less efficient to the Customer.  (But hey, I'm okay with the additional income...)

So it comes down to the need for a trigger to intercept a record during INSERT, determine (by checking the five-to-seven columns) if it *is* a duplicate (regardless of what the additional columns contain), and <somehow> mark the record as a duplicate, probably by overloading a column, otherwise leaving it alone, and letting it INSERT as it was.

But how to approach it?  I almost always get burnt because I'm too ignorant to know how to reference the INSERT data vs. the existing data.  And - of course - there are almost no (known) notes to help me figure this out.

Thanks for your information, though - you're incredibly sharp!  And I really appreciate that.

Hope this message finds you well,

-LongFist
0
 
LVL 1

Author Comment

by:LongFist
ID: 10652428
Waitaseond!  When I read:

>>Also:
>>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vdbref/html/dvrefindexkeyspropertypage.asp
>>
>>By creating an index, you can delete the duplicate rows with the option:
>>Ignore duplicate rows   Deletes duplicate rows from the index as it is being created
>>
>>You can copy the data to a temp table, with your input/original data source for
>>checking across you >>range create an index, automatically delete the rows-
>>-- hey presto -- nice quick solution.
>>(Generally the index creation to delete dups is the fastest mechanism to remove
>>dups in a billing run -- I've seen this done at many sites this way)

I thought you meant perform the rollover of original data to/from the Temp dta each time - but as I read the second article, I began to SEE THE LIGHT.

Shoot, you may have answered this perfectly.  Let me try it on site, and see if the monster gronks.  When the twonky stops showing duplicate records, you get the points.  (And well-earned points they are, too!  Thank you!!!)

Thanks again --- sorry I misread it at first, but I was scanning more than reading to determine if I had a solution or not - you'd think I'd pay more attention, right?

Again, I hope this message finds you well,

-LongFist
0
 
LVL 13

Expert Comment

by:danblake
ID: 10652832
I could have made is slightly clearer...

To test...
You can copy the data to a temp table, with your input/original data source for checking across your range create an index, automatically delete the rows- -- hey presto -- nice quick solution. (Generally the index creation to delete dups is the fastest mechanism to remove dups in a billing run -- I've seen this done at many sites this way)

Sometimes, the final index (duplicate index -- check) if across many rows is faster to use within a temporary table, offsetting the working space to tempdb (a partial check of the entire data-set because you are copying only a subset of the records) and you do not want the duplicate check index on the final RDBMS table.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 1

Author Comment

by:LongFist
ID: 10652947
Okay - I'll admit it - I'm still new to all the new techniques we have available to us since SQL Server 6.0 (and since my employers generally love my code, they almost never let me play with the SQL Server for any length of time...) - so i'm not sure how I'd spread the dupe-check functionality across to the TempDB.  Or is that what you are saying?

I'm sorry if I strike you as being an ignorant jerk - I admit that I *am* ignorant.  But sometimes the complexity of the simplest tasks escapes me.  Mind you, in VB/VC++ and .Net, the world spins a lot differently, and I'm so much more at home there.

Thanks again - I'm eager to apply what I've learned on the site: I think you've really hit on the solution!

Did I mention Thank You?  For your time and trouble, that is...

-LongFist
0
 
LVL 13

Expert Comment

by:danblake
ID: 10653396
No worries, looking for a new challenge...
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 125 total points
ID: 10662052
First, add a boolean column to the table to indicate a duplicate row:

ALTER TABLE AccountData
ADD DuplicateEntry BIT DEFAULT 0
GO


Then, add a trigger something like below.  Naturally you will need to replace col1 ... col7 with the names of your seven (or however many) determinant columns.

You will need an index on the controlling columns if the table is large and should probably add one anyway, just in case, although it will be a large index since it will have so many columns.


CREATE TRIGGER AccountData_Trig01_ChkDups
ON AccountData
INSTEAD OF INSERT
AS
INSERT INTO AccountData
SELECT col1, col2, col3, col4, col5, col6, col7, ...allOtherColumnsExceptBit...,
      CASE WHEN EXISTS(
            SELECT 1
            FROM AccountData AD2
            WHERE AD2.col1 = AccountData.col1
            AND AD2.col2 = AccountData.col2
            AND AD2.col3 = AccountData.col3
            AND AD2.col4 = AccountData.col4
            AND AD2.col5 = AccountData.col5
            AND AD2.col6 = AccountData.col6
            AND AD2.col7 = AccountData.col7
      ) THEN 1 ELSE 0 END
FROM inserted


Note: this code assumes SQL 2K.  If on SQL 7.0, you will have to use an UPDATE in an AFTER INSERT trigger, since INSTEAD OF is not available in 7.0.
0
 
LVL 13

Assisted Solution

by:danblake
danblake earned 125 total points
ID: 10665295
The reason why you may wish to use TempDb is if you are trying to validate data-import (staging-table) prior to importing to the final table.  It is easier, to create an index on a temporary table whilst keeping a database live if the covering index cannot be kept on the final destination table.

When MS release the ability to create online indexe rebuilds/builds and drops this need may be negated.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now