• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 552
  • Last Modified:

Delete duplicate row from a table without a primary key

I have a table which has duplicate rows. The table does not have primary key. I need to keep one row and delete the rest. The table looks like below

OrderNum | Name | Cust         | origID | Qty | OrderDt

123456      | abc     | 354          | 78       | 2     | 01/25/2014
123456      | abc     | 354          | 79       | 0     | 01/25/2014
345678      | abc     | 789          | 67       | 0     | 01/25/2014
345678      | abc     | 789          | 68       | 0     | 01/25/2014

So all the values are same except [ID]. The ID value is unique within the duplicate rows but some other row might have the same ID so it is not unique within the table. Now if the Qty field is 0 for both the duplicate rows I can keep any one of the row and delete the other. But if the Qty field for one of the rows is greater than 0, I need to keep that row and delete the other row that has 0 value. So in the example above my table should keep the following rows:

OrderNum | Name | Cust | origID | Qty | OrderDt

123456      | abc     | 354  | 78       | 2     | 01/25/2014
345678      | abc     | 789  | 67       | 0     | 01/25/2014


I started writing a stored procedure by selecting all the duplicate rows from the table for a specific OrderDt. Then I wrote a cursor to loop through each set of duplicates. I now need to delete the required row as I mentioned above. i have attached my stored procedure. Of course, if you have better idea please advise.
EE-snippet.txt
0
Angel02
Asked:
Angel02
  • 5
  • 2
2 Solutions
 
Robert SchuttSoftware EngineerCommented:
Try this, I made a @TESTRUN variable so you can see what would be deleted first, then when you're ready to run, change it to 0.

This basically follows your description of circumstances quite literally, in a nutshell: for each order/cust/date combination check if there's a record with Qty > 0 then delete the others, else delete everything but the first (based on origId).

By the way, it can be done with a single query but I had a feeling you weren't looking for that.
DECLARE @TESTRUN bit
SET @TESTRUN = 1

DECLARE @dOrder float, @dCust float, @dName varchar(30), @orderDate datetime
DECLARE dup_cursor CURSOR FOR 
select [ORDERNum], CUST, Name, OrderDt from tblOrder
--where OrderDt = @orderDate
group by [ORDERNum], CUST, Name, OrderDt
having count(origid) > 1

OPEN dup_cursor
FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
WHILE @@fetch_status = 0 
BEGIN

--print convert(varchar, @dorder) + ', ' + convert(varchar, @dCust)

-- Delete the required row here
IF 0 = (SELECT COUNT(*) FROM tblOrder WHERE [Qty] > 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)
	-- delete everything but the first
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
ELSE
	-- delete where Qty = 0
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate

FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
END
CLOSE dup_cursor
DEALLOCATE dup_cursor

Open in new window

0
 
Robert SchuttSoftware EngineerCommented:
PS: I commented out the check on @orderDate because that wasn't defined. If you want to test for 1 date first, remove the "--" on line 7.
0
 
ravikantninaveCommented:
An easy way to remove duplicate rows from a table in SQL Server  is to use undocumented feature called %%physloc%%.  

DELETE
FROM  OrderMast
WHERE OrderMast.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   OrderMast  b
              GROUP BY b.OrderNum, b.Name , b.Cust );
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Robert SchuttSoftware EngineerCommented:
@ravikantninave
Nice trick but it does not take the 'Qty > 0' point into account, unless that is always the first record of the set, but that's not how the OP describes it.
0
 
Angel02Author Commented:
@ Robert Schutt
Thank you so much! That worked like a charm!

One change that I had to make because of my wrong information is that the origID is also same in the duplicate columns. The column that is different is "ID". So I changed a part of the query to:

IF @TESTRUN = 1
            SELECT * FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
      ELSE
            DELETE   FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate


Now I got curious when you said this could be done in one single query. Of course I would prefer that. I just thought it was not possible because the table does not have a primary query. I will be so much glad if you could tell me how this be done in a single query.

Thanks much!
0
 
Robert SchuttSoftware EngineerCommented:
Well I'll post the query I tested below, but it needs to be adjusted for the change you described.

Also, please note that without indexes, this may be quite slow (but the query optimizer will hopefully take care of the worst, the duplicate count).
SELECT * --DELETE 
FROM tblOrder o1
WHERE (
	(0 = (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.OrigId > (SELECT MIN(o2.OrigId) FROM tblOrder o2 WHERE o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt)) 
) OR (
	(0 < (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.[Qty] = 0)
)

Open in new window

0
 
Robert SchuttSoftware EngineerCommented:
Some more remarks:
* if CUST and Name always refer to the same customer, you can take out the name check everywhere
* if an index is present on the table (other than primary key), for example on "ordernum, cust, orderdt" then the query above could make good use of that
0
 
Angel02Author Commented:
Thank you!
0

Featured Post

Learn to develop an Android App

Want to increase your earning potential in 2018? Pad your resume with app building experience. Learn how with this hands-on course.

  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now