Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Delete duplicate row from a table without a primary key

Posted on 2014-01-25
8
Medium Priority
?
547 Views
Last Modified: 2014-01-27
I have a table which has duplicate rows. The table does not have primary key. I need to keep one row and delete the rest. The table looks like below

OrderNum | Name | Cust         | origID | Qty | OrderDt

123456      | abc     | 354          | 78       | 2     | 01/25/2014
123456      | abc     | 354          | 79       | 0     | 01/25/2014
345678      | abc     | 789          | 67       | 0     | 01/25/2014
345678      | abc     | 789          | 68       | 0     | 01/25/2014

So all the values are same except [ID]. The ID value is unique within the duplicate rows but some other row might have the same ID so it is not unique within the table. Now if the Qty field is 0 for both the duplicate rows I can keep any one of the row and delete the other. But if the Qty field for one of the rows is greater than 0, I need to keep that row and delete the other row that has 0 value. So in the example above my table should keep the following rows:

OrderNum | Name | Cust | origID | Qty | OrderDt

123456      | abc     | 354  | 78       | 2     | 01/25/2014
345678      | abc     | 789  | 67       | 0     | 01/25/2014


I started writing a stored procedure by selecting all the duplicate rows from the table for a specific OrderDt. Then I wrote a cursor to loop through each set of duplicates. I now need to delete the required row as I mentioned above. i have attached my stored procedure. Of course, if you have better idea please advise.
EE-snippet.txt
0
Comment
Question by:Angel02
  • 5
  • 2
8 Comments
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 2000 total points
ID: 39809871
Try this, I made a @TESTRUN variable so you can see what would be deleted first, then when you're ready to run, change it to 0.

This basically follows your description of circumstances quite literally, in a nutshell: for each order/cust/date combination check if there's a record with Qty > 0 then delete the others, else delete everything but the first (based on origId).

By the way, it can be done with a single query but I had a feeling you weren't looking for that.
DECLARE @TESTRUN bit
SET @TESTRUN = 1

DECLARE @dOrder float, @dCust float, @dName varchar(30), @orderDate datetime
DECLARE dup_cursor CURSOR FOR 
select [ORDERNum], CUST, Name, OrderDt from tblOrder
--where OrderDt = @orderDate
group by [ORDERNum], CUST, Name, OrderDt
having count(origid) > 1

OPEN dup_cursor
FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
WHILE @@fetch_status = 0 
BEGIN

--print convert(varchar, @dorder) + ', ' + convert(varchar, @dCust)

-- Delete the required row here
IF 0 = (SELECT COUNT(*) FROM tblOrder WHERE [Qty] > 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)
	-- delete everything but the first
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
ELSE
	-- delete where Qty = 0
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate

FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
END
CLOSE dup_cursor
DEALLOCATE dup_cursor

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39809873
PS: I commented out the check on @orderDate because that wasn't defined. If you want to test for 1 date first, remove the "--" on line 7.
0
 
LVL 4

Expert Comment

by:ravikantninave
ID: 39809940
An easy way to remove duplicate rows from a table in SQL Server  is to use undocumented feature called %%physloc%%.  

DELETE
FROM  OrderMast
WHERE OrderMast.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   OrderMast  b
              GROUP BY b.OrderNum, b.Name , b.Cust );
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810018
@ravikantninave
Nice trick but it does not take the 'Qty > 0' point into account, unless that is always the first record of the set, but that's not how the OP describes it.
0
 

Author Comment

by:Angel02
ID: 39810429
@ Robert Schutt
Thank you so much! That worked like a charm!

One change that I had to make because of my wrong information is that the origID is also same in the duplicate columns. The column that is different is "ID". So I changed a part of the query to:

IF @TESTRUN = 1
            SELECT * FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
      ELSE
            DELETE   FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate


Now I got curious when you said this could be done in one single query. Of course I would prefer that. I just thought it was not possible because the table does not have a primary query. I will be so much glad if you could tell me how this be done in a single query.

Thanks much!
0
 
LVL 35

Assisted Solution

by:Robert Schutt
Robert Schutt earned 2000 total points
ID: 39810445
Well I'll post the query I tested below, but it needs to be adjusted for the change you described.

Also, please note that without indexes, this may be quite slow (but the query optimizer will hopefully take care of the worst, the duplicate count).
SELECT * --DELETE 
FROM tblOrder o1
WHERE (
	(0 = (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.OrigId > (SELECT MIN(o2.OrigId) FROM tblOrder o2 WHERE o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt)) 
) OR (
	(0 < (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.[Qty] = 0)
)

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810451
Some more remarks:
* if CUST and Name always refer to the same customer, you can take out the name check everywhere
* if an index is present on the table (other than primary key), for example on "ordernum, cust, orderdt" then the query above could make good use of that
0
 

Author Comment

by:Angel02
ID: 39814120
Thank you!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question