Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

MSSQL Unique identifier based on row content

Posted on 2011-03-10
10
Medium Priority
?
524 Views
Last Modified: 2012-05-11
Hi

I'm writing a program which contains an import process (of csv data) before the data is committed to the main table I want to ensure that it is not duplicate data.  

I think the best way to do this is to create a unique Identifier based on the row content a bit like an MP5 hash, but I don’t think this is supported in MSSQL (I’m using 2008 R2)

Can any one recommend a way to achieve this?

Thanks

Brasso
0
Comment
Question by:brasso_42
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 9

Expert Comment

by:kaminda
ID: 35093386
You can write a .NET Assembly to create the hash and then use it in a CLR UDF. So you can use that function to create the unique identifier while inserting data.
0
 
LVL 1

Author Comment

by:brasso_42
ID: 35093393
Hi

Do you have an example of how to do that?

Thanks

Brasso
0
 
LVL 9

Expert Comment

by:mayank_joshi
ID: 35093418
You can use ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) to Delete Duplicate Rows in SQL Table

e.g.,
http://www.kodyaz.com/articles/delete-duplicate-rows-using-row-number-partition-by-order-by.aspx
0
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

 
LVL 1

Author Comment

by:brasso_42
ID: 35093538
Hi

Necessarily want to delete them, I just want to be made aware of them.

Thanks

Brasso
0
 
LVL 9

Expert Comment

by:mayank_joshi
ID: 35093558
so instead of deleting you can insert them into another table of same structure.
0
 
LVL 9

Expert Comment

by:mayank_joshi
ID: 35093577
eg.,
WITH [CTE DUPLICATE] AS
(
SELECT
  RN = ROW_NUMBER() OVER (PARTITION BY CompanyTitle ORDER BY LastContactDate DESC),
  Id, CompanyTitle, ContactName, LastContactDate
FROM Suppliers
)
Insert into Suppliers_Duplicates select *  FROM [CTE DUPLICATE] WHERE RN > 1

Open in new window



here Suppliers and Suppliers_Duplicates should have same structure.
0
 
LVL 9

Expert Comment

by:mayank_joshi
ID: 35093612
Better:-

WITH [CTE DUPLICATE] AS
(
SELECT
  RN = ROW_NUMBER() OVER (PARTITION BY CompanyTitle ORDER BY LastContactDate DESC),
  Id, CompanyTitle, ContactName, LastContactDate
FROM Suppliers
)
Insert into Suppliers_Duplicates ( Id, CompanyTitle, ContactName, LastContactDate )
select  Id, CompanyTitle, ContactName, LastContactDate  FROM [CTE DUPLICATE] WHERE RN > 1
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35094900
The correct way to do this is always, always import into a staging table, validate your data prior to inserting into the final production table(s).  Remember the old axiom: Garbage in, garbage out.

>>I want to ensure that it is not duplicate data. <<
Before we can help you here, we need to know your definition of "duplicate data", is it:
1. Duplicate imported data or
2. Duplicate data when comparing imported data to existing data or
3. Something else.
0
 
LVL 1

Author Comment

by:brasso_42
ID: 35094923
Hi Acperkins

This is exactly what I want to do.   I want to validate on both point one and two.  

Thanks

Brasso

0
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 2000 total points
ID: 35095003
The way you shoud do it is:
1.  Import into staging table.
2.  Use a query with a GROUP BY clause based on the key columns to determine if any duplicates in the staging table if not
3.  Use a query with an INNER JOIN between the staging table and the production table based on the same key columns to determine if any duplicates in the staging table if not
4.  Import the now validated data into production
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question