Solved

Delete duplicate row from a table without a primary key

Posted on 2014-01-25
8
539 Views
Last Modified: 2014-01-27
I have a table which has duplicate rows. The table does not have primary key. I need to keep one row and delete the rest. The table looks like below

OrderNum | Name | Cust         | origID | Qty | OrderDt

123456      | abc     | 354          | 78       | 2     | 01/25/2014
123456      | abc     | 354          | 79       | 0     | 01/25/2014
345678      | abc     | 789          | 67       | 0     | 01/25/2014
345678      | abc     | 789          | 68       | 0     | 01/25/2014

So all the values are same except [ID]. The ID value is unique within the duplicate rows but some other row might have the same ID so it is not unique within the table. Now if the Qty field is 0 for both the duplicate rows I can keep any one of the row and delete the other. But if the Qty field for one of the rows is greater than 0, I need to keep that row and delete the other row that has 0 value. So in the example above my table should keep the following rows:

OrderNum | Name | Cust | origID | Qty | OrderDt

123456      | abc     | 354  | 78       | 2     | 01/25/2014
345678      | abc     | 789  | 67       | 0     | 01/25/2014


I started writing a stored procedure by selecting all the duplicate rows from the table for a specific OrderDt. Then I wrote a cursor to loop through each set of duplicates. I now need to delete the required row as I mentioned above. i have attached my stored procedure. Of course, if you have better idea please advise.
EE-snippet.txt
0
Comment
Question by:Angel02
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
8 Comments
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
ID: 39809871
Try this, I made a @TESTRUN variable so you can see what would be deleted first, then when you're ready to run, change it to 0.

This basically follows your description of circumstances quite literally, in a nutshell: for each order/cust/date combination check if there's a record with Qty > 0 then delete the others, else delete everything but the first (based on origId).

By the way, it can be done with a single query but I had a feeling you weren't looking for that.
DECLARE @TESTRUN bit
SET @TESTRUN = 1

DECLARE @dOrder float, @dCust float, @dName varchar(30), @orderDate datetime
DECLARE dup_cursor CURSOR FOR 
select [ORDERNum], CUST, Name, OrderDt from tblOrder
--where OrderDt = @orderDate
group by [ORDERNum], CUST, Name, OrderDt
having count(origid) > 1

OPEN dup_cursor
FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
WHILE @@fetch_status = 0 
BEGIN

--print convert(varchar, @dorder) + ', ' + convert(varchar, @dCust)

-- Delete the required row here
IF 0 = (SELECT COUNT(*) FROM tblOrder WHERE [Qty] > 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)
	-- delete everything but the first
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE (OrigId > (SELECT MIN(OrigId) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
ELSE
	-- delete where Qty = 0
	IF @TESTRUN = 1
		SELECT * FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
	ELSE
		DELETE   FROM tblOrder WHERE [Qty] = 0 AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate

FETCH NEXT FROM dup_cursor INTO @dorder, @dCust, @dName, @OrderDate
END
CLOSE dup_cursor
DEALLOCATE dup_cursor

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39809873
PS: I commented out the check on @orderDate because that wasn't defined. If you want to test for 1 date first, remove the "--" on line 7.
0
 
LVL 4

Expert Comment

by:ravikantninave
ID: 39809940
An easy way to remove duplicate rows from a table in SQL Server  is to use undocumented feature called %%physloc%%.  

DELETE
FROM  OrderMast
WHERE OrderMast.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   OrderMast  b
              GROUP BY b.OrderNum, b.Name , b.Cust );
0
Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810018
@ravikantninave
Nice trick but it does not take the 'Qty > 0' point into account, unless that is always the first record of the set, but that's not how the OP describes it.
0
 

Author Comment

by:Angel02
ID: 39810429
@ Robert Schutt
Thank you so much! That worked like a charm!

One change that I had to make because of my wrong information is that the origID is also same in the duplicate columns. The column that is different is "ID". So I changed a part of the query to:

IF @TESTRUN = 1
            SELECT * FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate
      ELSE
            DELETE   FROM tblOrder WHERE (Id > (SELECT MIN(Id) FROM tblOrder WHERE [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate)) AND [ORDERNum] = @dorder AND CUST = @dCust AND Name = @dName AND OrderDt = @OrderDate


Now I got curious when you said this could be done in one single query. Of course I would prefer that. I just thought it was not possible because the table does not have a primary query. I will be so much glad if you could tell me how this be done in a single query.

Thanks much!
0
 
LVL 35

Assisted Solution

by:Robert Schutt
Robert Schutt earned 500 total points
ID: 39810445
Well I'll post the query I tested below, but it needs to be adjusted for the change you described.

Also, please note that without indexes, this may be quite slow (but the query optimizer will hopefully take care of the worst, the duplicate count).
SELECT * --DELETE 
FROM tblOrder o1
WHERE (
	(0 = (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.OrigId > (SELECT MIN(o2.OrigId) FROM tblOrder o2 WHERE o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt)) 
) OR (
	(0 < (SELECT COUNT(*) FROM tblOrder o2 WHERE o2.[Qty] > 0 AND o2.[ORDERNum] = o1.[ORDERNum] AND o2.CUST = o1.CUST AND o2.Name = o1.Name AND o2.OrderDt = o1.OrderDt))
	AND (o1.[Qty] = 0)
)

Open in new window

0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 39810451
Some more remarks:
* if CUST and Name always refer to the same customer, you can take out the name check everywhere
* if an index is present on the table (other than primary key), for example on "ordernum, cust, orderdt" then the query above could make good use of that
0
 

Author Comment

by:Angel02
ID: 39814120
Thank you!
0

Featured Post

Why You Need a DevOps Toolchain

IT needs to deliver services with more agility and velocity. IT must roll out application features and innovations faster to keep up with customer demands, which is where a DevOps toolchain steps in. View the infographic to see why you need a DevOps toolchain.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL (https://www.percona.com/software/mysql-database/percona-server) and MongoDB (https://www.percona.com/software/mongo-…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question