Solved

MS SQL SERVER prevent duplicates being inserted

Posted on 2015-02-10
10
75 Views
Last Modified: 2015-02-19
I have a table that contains 3 fields as clustered index that is used to prevent duplicates to a temporary table used to grab data from an outside source.  

The next step is copying this data to an active table that uses the fields from the clustered index for most of the rows of data, but not all so I can't duplicate the clustered index on the live table.

I have scripts that auto populate the temp table several times a day as well as copying the data over.

TEMP_TABLE  Clustered index on key1,key2 and key3 Ignore Duplicate Values=true
key1
key2
key3
field1
field2

LIVE_TABLE
ID  Identity_column
field1
field2
field3
key1
key2
key3

What is the best option to copy the data from the temp to live and avoid duplicates on key1,key2 and key3 where there will be rows of data that cause the key fields to be null as they will not be needed.  

When the key fields are not needed, data is entered as a one off.  When the key fields are used it will be brought in via an sql script.
0
Comment
Question by:Scott Fell,  EE MVE
  • 5
  • 4
10 Comments
 
LVL 65

Expert Comment

by:Jim Horn
ID: 40601523
I'm not following your table schemas, but assuming a primary key of id, and other columns goo, foo, and boo, the below INSERT will only insert rows in TEMP_TABLE that are currently not in LIVE_TABLE.
Modify to meet your needs.
INSERT INTO LIVE_TABLE (id, goo, foo, boo) 
SELECT tt.id, tt.goo, tt.foo, tt.boo
FROM TEMP_TABLE tt
-- LEFT means include all rows in TEMP_TABLE...
LEFT JOIN LIVE_TABLE lt ON tt.id = lt.id
   -- ... ONLY that are not already in the live table
WHERE lt.id IS NULL

Open in new window


btw if this is an SSIS solution you can also use a Lookup Transform between source and destination, connected to the live table, which would mean a staging 'TEMP TABLE' is no longer required.  Although there are lots of reasons why it would be a good idea anyways..
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40601625
Thanks Jim. The issue is goo, foo and boo combined make up a unique row.

goo = abc
foo = def
boo = xyz

goo = abc
foo = def
boo = qrs

goo = abc
foo = abc
boo = xyz

the only thing that would be unique is abcdefxyz.

INSERT INTO LIVE_TABLE (id, goo, foo, boo) 
SELECT tt.id, tt.goo, tt.foo, tt.boo
FROM TEMP_TABLE tt
-- LEFT means include all rows in TEMP_TABLE...
LEFT JOIN LIVE_TABLE lt ON tt.goo = lt.goo AND tt.foo = lt.foo AND tt.boo = lt.boo
   -- ... ONLY that are not already in the live table
WHERE lt.id IS NULL AND  lt.foo IS NULL AND lt.boo IS NULL

Open in new window

0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 40602337
my head hurts...sorry... but what does this mean? (When is a unique key not needed? I don't follow.)

>>"When the key fields are not needed, data is entered as a one off.  "

My STRONG suggestion is you provide an example set of data and expected result

----------
From Jim's comment above I would suggest removing [ID] from the logic and rely only on the 3 elements of uniqueness
[goo],[foo],[boo]

or the equivalent
[key1], [key2], [key3]

---------------

Another question:

Do you need to UPDATE LIVE_TABLE.[field1] and/or LIVE_TABLE.[field2] from the equivalent fields in TEMP_TABLE

If this answer is YES, then you could be using MERGE

This can perform inserts and updates in one operation.

something like this:

MERGE dbo.LIVE_TABLE AS target
USING (SELECT * FROM TEMP_TABLE) AS source (key1,key2,key3,field1,field2)
        ON (target.key1 = source.key1 AND target.key2 = source.key2 AND target.key3 = source.key3)
WHEN MATCHED THEN 
    UPDATE SET target.field1 = source.field1, target.field2 = source.field2
WHEN NOT MATCHED THEN	
    INSERT (key1,key2,key3,field1,field2)
    VALUES (source.key1,source.key2,source.key3,source.field1,source.field2)
;

Open in new window

0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40603788
This is what the data looks like from the temp table.  There is an ID as an Identity field, but for this purpose it is not used for anything or imported to the live table.  

The probability that key3 on it's own is unique is pretty good.  However, we can't control that data and can't assume it is unique.  The only thing that can be considered unique is the combination of key1+key2+key3.   Key3 is supposed to be unique to key2 and there will be multiple rows of key2 related to key1.

ID	key1	key2	key3	field1
6834	2	abc	box1	some data 1
6835	2	abc	box2	some data 2
6836	2	abc	box3	some data 3
6837	2	abc	box4	some data 4
6838	2	abc	box5	some data 5
6839	2	abc	box6	some data 6
6840	2	def	circle1	some data 7
6841	2	def	circle2	some data 8
6842	2	def	circle3	some data 9
6843	2	def	circle4	some data 10
6844	2	def	circle5	some data 11
6845	2	def	circle6	some data 12
6846	2	def	circle7	some data 13
6847	3	aef	triangle1	some data 14
6848	3	aef	triangle2	some data 15
6849	3	aef	triangle3	some data 16
6850	3	aef	triangle4	some data 17
6851	3	zxy	cylinder1	some data 18
6852	3	zxy	cylinder2	some data 19
6853	3	zxy	cylinder3	some data 20
6854	3	zxy	cylinder4	some data 21
6855	3	zxy	cylinder5	some data 22
6856	3	zxy	cylinder6	some data 23
6857	3	zxy	cylinder7	some data 24

Open in new window


There are 2 types of data that are inserted/updated to the live table.  One type does not utilize the key1, key2 or key3 fields and therefor, 20% of the rows will have those fields blank or null.   In this case, we are importing these fields and I bring up the 20% of the data not containing data because I'm not sure the best way to bring this data in and make sure it is unique.  

I thought I could create a view with just the data where these fields are filled in and use the same process I import to the temp table where I use the clustered index on the 3 key fields and skip duplicates.

Each import could be a few hundred to a few thousand rows of data for now.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40604493
There are 2 types of data that are inserted/updated to the live table.  
ok

One type does not utilize the key1, key2 or key3 fields and therefor, 20% of the rows will have those fields blank or null.
so:
type 1 has key1, key2, key3
type 2 does not have values for key1, key2, key3

 In this case, we are importing these fields
for type 2, you are importing the non-null fields (i.e. NOT key1, key2, key3)

and I bring up the 20% of the data not containing data because I'm not sure the best way to bring this data in and make sure it is unique.  
I do not understand this bit. What does "bring up the data" mean?

When you import those type 2 rows, do key1, key2, key3 simply remain NULL forever?

(I assume there some auto-incrementing unique identifier on the "LIVE_TABLE")
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40604510
Both tables have an ID field that is auto incrementing.    Type 1 data is what is being imported in bulk or as one off.  Type 2 data is entered only as one off.

During the import, it is very possible that the same combination of key1+key2+key3 has already been added to the live table and I do not want to add it if it is there.   I could just do a look up for each row, I was wondering if there is something more efficient that I did not think of.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40604514
Have you considered/tried MERGE?
0
 
LVL 52

Author Comment

by:Scott Fell, EE MVE
ID: 40620549
Thank you both!  I am going with the Merge on this one.
0
 
LVL 52

Author Closing Comment

by:Scott Fell, EE MVE
ID: 40620552
Thank you Jim and Paul.  The Merge worked easily for this live.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40620605
Pleased to hear it. Cheers, Paul
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now