Solved

Delete duplicate row from a table without a primary key

Posted on 2014-01-25
8
538 Views
Last Modified: 2014-01-27
I have a table which has duplicate rows. The table does not have primary key. I need to keep one row and delete the rest. The table looks like below

OrderNum | Name | Cust         | origID | Qty | OrderDt

123456      | abc     | 354          | 78       | 2     | 01/25/2014
123456      | abc     | 354          | 79       | 0     | 01/25/2014
345678      | abc     | 789          | 67       | 0     | 01/25/2014
345678      | abc     | 789          | 68       | 0     | 01/25/2014

So all the values are same except [ID]. The ID value is unique within the duplicate rows but some other row might have the same ID so it is not unique within the table. Now if the Qty field is 0 for both the duplicate rows I can keep any one of the row and delete the other. But if the Qty field for one of the rows is greater than 0, I need to keep that row and delete the other row that has 0 value. So in the example above my table should keep the following rows:

OrderNum | Name | Cust | origID | Qty | OrderDt

123456      | abc     | 354  | 78       | 2     | 01/25/2014
345678      | abc     | 789  | 67       | 0     | 01/25/2014


I started writing a stored procedure by selecting all the duplicate rows from the table for a specific OrderDt. Then I wrote a cursor to loop through each set of duplicates. I now need to delete the required row as I mentioned above. i have attached my stored procedure. Of course, if you have better idea please advise.
EE-snippet.txt
0
Comment
Question by:Angel02
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
8 Comments
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
ID: 39809871
Try this, I made a @TESTRUN variable so you can see what would be deleted first, then when you're ready to run, change it to 0.

This basically follows your description of circumstances quite literally, in a nutshell: for each order/cust/date combination check if there's a record with Qty > 0 then delete the others, else delete everything but the first (based on origId).

By the way, it can be done with a single query but I had a feeling you weren't looking for that.
DECLARE @TESTRUN bit
SET @TESTRUN = 1

DECLARE @dOrder float, @dCust float, @dName varchar(30), @orderDate datetime
DECLARE dup_cursor CURSOR FOR 
select [ORDERNum], CUST, Name, OrderDt from tblOrder
--where OrderDt = @orderDate
group by [ORDERNum], CUST, Name, OrderDt
having count(origid) > 1

OPEN dup_cursor
FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
WHILE @@fetch_status = 0 
BEGIN

--print convert(varchar, @dorder) + ', ' + convert(varchar, @dCust)

-- Delete the required row here
IF 0 = (SELECT COUNT(*) FROM tblOrder WHERE [Qty] > 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)
	-- delete everything but the first
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
ELSE
	-- delete where Qty = 0
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate

FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
END
CLOSE dup_cursor
DEALLOCATE dup_cursor

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39809873
PS: I commented out the check on @orderDate because that wasn't defined. If you want to test for 1 date first, remove the "--" on line 7.
0
 
LVL 4

Expert Comment

by:ravikantninave
ID: 39809940
An easy way to remove duplicate rows from a table in SQL Server  is to use undocumented feature called %%physloc%%.  

DELETE
FROM  OrderMast
WHERE OrderMast.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   OrderMast  b
              GROUP BY b.OrderNum, b.Name , b.Cust );
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810018
@ravikantninave
Nice trick but it does not take the 'Qty > 0' point into account, unless that is always the first record of the set, but that's not how the OP describes it.
0
 

Author Comment

by:Angel02
ID: 39810429
@ Robert Schutt
Thank you so much! That worked like a charm!

One change that I had to make because of my wrong information is that the origID is also same in the duplicate columns. The column that is different is "ID". So I changed a part of the query to:

IF @TESTRUN = 1
            SELECT * FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
      ELSE
            DELETE   FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate


Now I got curious when you said this could be done in one single query. Of course I would prefer that. I just thought it was not possible because the table does not have a primary query. I will be so much glad if you could tell me how this be done in a single query.

Thanks much!
0
 
LVL 35

Assisted Solution

by:Robert Schutt
Robert Schutt earned 500 total points
ID: 39810445
Well I'll post the query I tested below, but it needs to be adjusted for the change you described.

Also, please note that without indexes, this may be quite slow (but the query optimizer will hopefully take care of the worst, the duplicate count).
SELECT * --DELETE 
FROM tblOrder o1
WHERE (
	(0 = (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.OrigId > (SELECT MIN(o2.OrigId) FROM tblOrder o2 WHERE o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt)) 
) OR (
	(0 < (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.[Qty] = 0)
)

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810451
Some more remarks:
* if CUST and Name always refer to the same customer, you can take out the name check everywhere
* if an index is present on the table (other than primary key), for example on "ordernum, cust, orderdt" then the query above could make good use of that
0
 

Author Comment

by:Angel02
ID: 39814120
Thank you!
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…
This video shows how to use Hyena, from SystemTools Software, to update 100 user accounts from an external text file. View in 1080p for best video quality.

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question