Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Delete duplicate row from a table without a primary key

Posted on 2014-01-25
8
Medium Priority
?
544 Views
Last Modified: 2014-01-27
I have a table which has duplicate rows. The table does not have primary key. I need to keep one row and delete the rest. The table looks like below

OrderNum | Name | Cust         | origID | Qty | OrderDt

123456      | abc     | 354          | 78       | 2     | 01/25/2014
123456      | abc     | 354          | 79       | 0     | 01/25/2014
345678      | abc     | 789          | 67       | 0     | 01/25/2014
345678      | abc     | 789          | 68       | 0     | 01/25/2014

So all the values are same except [ID]. The ID value is unique within the duplicate rows but some other row might have the same ID so it is not unique within the table. Now if the Qty field is 0 for both the duplicate rows I can keep any one of the row and delete the other. But if the Qty field for one of the rows is greater than 0, I need to keep that row and delete the other row that has 0 value. So in the example above my table should keep the following rows:

OrderNum | Name | Cust | origID | Qty | OrderDt

123456      | abc     | 354  | 78       | 2     | 01/25/2014
345678      | abc     | 789  | 67       | 0     | 01/25/2014


I started writing a stored procedure by selecting all the duplicate rows from the table for a specific OrderDt. Then I wrote a cursor to loop through each set of duplicates. I now need to delete the required row as I mentioned above. i have attached my stored procedure. Of course, if you have better idea please advise.
EE-snippet.txt
0
Comment
Question by:Angel02
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
8 Comments
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 2000 total points
ID: 39809871
Try this, I made a @TESTRUN variable so you can see what would be deleted first, then when you're ready to run, change it to 0.

This basically follows your description of circumstances quite literally, in a nutshell: for each order/cust/date combination check if there's a record with Qty > 0 then delete the others, else delete everything but the first (based on origId).

By the way, it can be done with a single query but I had a feeling you weren't looking for that.
DECLARE @TESTRUN bit
SET @TESTRUN = 1

DECLARE @dOrder float, @dCust float, @dName varchar(30), @orderDate datetime
DECLARE dup_cursor CURSOR FOR 
select [ORDERNum], CUST, Name, OrderDt from tblOrder
--where OrderDt = @orderDate
group by [ORDERNum], CUST, Name, OrderDt
having count(origid) > 1

OPEN dup_cursor
FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
WHILE @@fetch_status = 0 
BEGIN

--print convert(varchar, @dorder) + ', ' + convert(varchar, @dCust)

-- Delete the required row here
IF 0 = (SELECT COUNT(*) FROM tblOrder WHERE [Qty] > 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)
	-- delete everything but the first
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
ELSE
	-- delete where Qty = 0
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate

FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
END
CLOSE dup_cursor
DEALLOCATE dup_cursor

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39809873
PS: I commented out the check on @orderDate because that wasn't defined. If you want to test for 1 date first, remove the "--" on line 7.
0
 
LVL 4

Expert Comment

by:ravikantninave
ID: 39809940
An easy way to remove duplicate rows from a table in SQL Server  is to use undocumented feature called %%physloc%%.  

DELETE
FROM  OrderMast
WHERE OrderMast.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   OrderMast  b
              GROUP BY b.OrderNum, b.Name , b.Cust );
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810018
@ravikantninave
Nice trick but it does not take the 'Qty > 0' point into account, unless that is always the first record of the set, but that's not how the OP describes it.
0
 

Author Comment

by:Angel02
ID: 39810429
@ Robert Schutt
Thank you so much! That worked like a charm!

One change that I had to make because of my wrong information is that the origID is also same in the duplicate columns. The column that is different is "ID". So I changed a part of the query to:

IF @TESTRUN = 1
            SELECT * FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
      ELSE
            DELETE   FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate


Now I got curious when you said this could be done in one single query. Of course I would prefer that. I just thought it was not possible because the table does not have a primary query. I will be so much glad if you could tell me how this be done in a single query.

Thanks much!
0
 
LVL 35

Assisted Solution

by:Robert Schutt
Robert Schutt earned 2000 total points
ID: 39810445
Well I'll post the query I tested below, but it needs to be adjusted for the change you described.

Also, please note that without indexes, this may be quite slow (but the query optimizer will hopefully take care of the worst, the duplicate count).
SELECT * --DELETE 
FROM tblOrder o1
WHERE (
	(0 = (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.OrigId > (SELECT MIN(o2.OrigId) FROM tblOrder o2 WHERE o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt)) 
) OR (
	(0 < (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.[Qty] = 0)
)

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810451
Some more remarks:
* if CUST and Name always refer to the same customer, you can take out the name check everywhere
* if an index is present on the table (other than primary key), for example on "ordernum, cust, orderdt" then the query above could make good use of that
0
 

Author Comment

by:Angel02
ID: 39814120
Thank you!
0

Featured Post

Survive A High-Traffic Event with Percona

Your application or website rely on your database to deliver information about products and services to your customers. You can’t afford to have your database lose performance, lose availability or become unresponsive – even for just a few minutes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question