asked on

MS SQL SERVER prevent duplicates being inserted

I have a table that contains 3 fields as clustered index that is used to prevent duplicates to a temporary table used to grab data from an outside source.

The next step is copying this data to an active table that uses the fields from the clustered index for most of the rows of data, but not all so I can't duplicate the clustered index on the live table.

I have scripts that auto populate the temp table several times a day as well as copying the data over.

TEMP_TABLE Clustered index on key1,key2 and key3 Ignore Duplicate Values=true
key1
key2
key3
field1
field2

LIVE_TABLE
ID Identity_column
field1
field2
field3
key1
key2
key3

What is the best option to copy the data from the temp to live and avoid duplicates on key1,key2 and key3 where there will be rows of data that cause the key fields to be null as they will not be needed.

When the key fields are not needed, data is entered as a one off. When the key fields are used it will be brought in via an sql script.

Jim Horn

I'm not following your table schemas, but assuming a primary key of id, and other columns goo, foo, and boo, the below INSERT will only insert rows in TEMP_TABLE that are currently not in LIVE_TABLE.
Modify to meet your needs.

INSERT INTO LIVE_TABLE (id, goo, foo, boo) 
SELECT tt.id, tt.goo, tt.foo, tt.boo
FROM TEMP_TABLE tt
-- LEFT means include all rows in TEMP_TABLE...
LEFT JOIN LIVE_TABLE lt ON tt.id = lt.id
   -- ... ONLY that are not already in the live table
WHERE lt.id IS NULL

Open in new window

btw if this is an SSIS solution you can also use a Lookup Transform between source and destination, connected to the live table, which would mean a staging 'TEMP TABLE' is no longer required. Although there are lots of reasons why it would be a good idea anyways..

Scott Fell

ASKER

Thanks Jim. The issue is goo, foo and boo combined make up a unique row.

goo = abc
foo = def
boo = xyz

goo = abc
foo = def
boo = qrs

goo = abc
foo = abc
boo = xyz

the only thing that would be unique is abcdefxyz.

INSERT INTO LIVE_TABLE (id, goo, foo, boo) 
SELECT tt.id, tt.goo, tt.foo, tt.boo
FROM TEMP_TABLE tt
-- LEFT means include all rows in TEMP_TABLE...
LEFT JOIN LIVE_TABLE lt ON tt.goo = lt.goo AND tt.foo = lt.foo AND tt.boo = lt.boo
   -- ... ONLY that are not already in the live table
WHERE lt.id IS NULL AND  lt.foo IS NULL AND lt.boo IS NULL

Open in new window

ASKER CERTIFIED SOLUTION

PortletPaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Scott Fell

ASKER

This is what the data looks like from the temp table. There is an ID as an Identity field, but for this purpose it is not used for anything or imported to the live table.

The probability that key3 on it's own is unique is pretty good. However, we can't control that data and can't assume it is unique. The only thing that can be considered unique is the combination of key1+key2+key3. Key3 is supposed to be unique to key2 and there will be multiple rows of key2 related to key1.

ID	key1	key2	key3	field1
6834	2	abc	box1	some data 1
6835	2	abc	box2	some data 2
6836	2	abc	box3	some data 3
6837	2	abc	box4	some data 4
6838	2	abc	box5	some data 5
6839	2	abc	box6	some data 6
6840	2	def	circle1	some data 7
6841	2	def	circle2	some data 8
6842	2	def	circle3	some data 9
6843	2	def	circle4	some data 10
6844	2	def	circle5	some data 11
6845	2	def	circle6	some data 12
6846	2	def	circle7	some data 13
6847	3	aef	triangle1	some data 14
6848	3	aef	triangle2	some data 15
6849	3	aef	triangle3	some data 16
6850	3	aef	triangle4	some data 17
6851	3	zxy	cylinder1	some data 18
6852	3	zxy	cylinder2	some data 19
6853	3	zxy	cylinder3	some data 20
6854	3	zxy	cylinder4	some data 21
6855	3	zxy	cylinder5	some data 22
6856	3	zxy	cylinder6	some data 23
6857	3	zxy	cylinder7	some data 24

Open in new window

There are 2 types of data that are inserted/updated to the live table. One type does not utilize the key1, key2 or key3 fields and therefor, 20% of the rows will have those fields blank or null. In this case, we are importing these fields and I bring up the 20% of the data not containing data because I'm not sure the best way to bring this data in and make sure it is unique.

I thought I could create a view with just the data where these fields are filled in and use the same process I import to the temp table where I use the clustered index on the 3 key fields and skip duplicates.

Each import could be a few hundred to a few thousand rows of data for now.

PortletPaul

There are 2 types of data that are inserted/updated to the live table.
ok

One type does not utilize the key1, key2 or key3 fields and therefor, 20% of the rows will have those fields blank or null.
so:
type 1 has key1, key2, key3
type 2 does not have values for key1, key2, key3

In this case, we are importing these fields
for type 2, you are importing the non-null fields (i.e. NOT key1, key2, key3)

and I bring up the 20% of the data not containing data because I'm not sure the best way to bring this data in and make sure it is unique.
I do not understand this bit. What does "bring up the data" mean?

When you import those type 2 rows, do key1, key2, key3 simply remain NULL forever?

(I assume there some auto-incrementing unique identifier on the "LIVE_TABLE")

Scott Fell

ASKER

Both tables have an ID field that is auto incrementing. Type 1 data is what is being imported in bulk or as one off. Type 2 data is entered only as one off.

During the import, it is very possible that the same combination of key1+key2+key3 has already been added to the live table and I do not want to add it if it is there. I could just do a look up for each row, I was wondering if there is something more efficient that I did not think of.

PortletPaul

Have you considered/tried MERGE?

Scott Fell

ASKER

Thank you both! I am going with the Merge on this one.

Scott Fell

ASKER

Thank you Jim and Paul. The Merge worked easily for this live.

PortletPaul

Pleased to hear it. Cheers, Paul