Best way to delete mass amount of rows?

Hello Experts

Somehow a table has grown into a monster.  It's over 330k rows and it should probably be about 100k.  I have repeating rows.  Looks like certain distinct rows repeat over 2200 times.  How can I safely & efficiently remove the duplicates?  Should I index the control column first and then delete from mytable where ID in (select top N from my table...) - Something like that?
jay-areAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Steve HoggITCommented:
C4
Just kidding.
Delete from is SO SLOW! If you are deleting more than you are keeping then you might consider doing an insert into a new table and then blow away the old one. How many rows are we talking about, what kind of frequency? One time?
0
jay-areAuthor Commented:
Yeah I wish I could just truncate it and start over but this is historical data that can't be reproduced...

I was reading about the slowness of delete from so I was worried about using that.  I tried to do this already and it sat processing for about 10 minutes before I got nervous and stopped it.

This SHOULD be a 1 time deal.  I'll have to do this based on groups of control numbers.  It looks to me that each group of control numbers have 1 real record I need to keep.  I have no objection to writing the real data out to a seperate table, then wiping the original and putting the real rows back into the old one.  That make sense?
0
funnymanmikeCommented:
what about building a query to import only the data you need into a new table, then just rename the two tables.
0
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

jay-areAuthor Commented:
Yeah that would be fine.  I just need to do whatever it takes to get the real data into the original table.  I have to get the duplicates out one way or another.
0
Steve HoggITCommented:
Since it is just a one-time clean up. Divide and conquer. Pick your grouping for smaller amounts to select and delete and go gettem. Watch your log.
0
jay-areAuthor Commented:
So you say just go for the delete from?  Should I try indexing the table first?
0
Steve HoggITCommented:
Yes, delete from where criteria makes it smaller delete blocks. I would not add indexing for this.
0
jay-areAuthor Commented:
How would I use the delete from and not delete every row.  I only want to delete where the row count for each group is > 1.
0
Steve HoggITCommented:
If you can write a select query, you can add the delete from. Without knowing your table structure, can you post a select that defines what you want to delete?
SELECT * FROM myTable WHERE Col1 = 'YourValue' and ...
DELETE FROM myTable Where Col1 = 'YourValue' and ...
0
jay-areAuthor Commented:
Something along these lines I think....

delete from paycheck where [ids.id] in (select top 2000 * from paycheck group by chknum)

[ids.id] is my unique identifier in that table.  I went as high as 2000 because there are some checks in this table that are duplicated about 2300 times...
0
Steve HoggITCommented:
OK, that make sense. Make sure you select returns the rows you expect to delete before you run it.  Also, it might require a little finesse.
>delete from paycheck where [ids.id] in (select top 2000 * from paycheck group by chknum)
I don't think that SQL select will run, group by error. This might a case of what you want to keep vs what to delete.
select chknum, max([ids.id]) from paycheck group by chknum
Does that select grab what you want to keep? How many rows?
0
Steve HoggITCommented:
If it does, then you do something like this.
STEP 1. Backup up your data before you do anything else. Make a copy of the table, whatever you normally do.
STEP 2. Confirm your delete SELECT stmt
select [ids.id] from paycheck where not exists (select chknum, max([ids.id]) from paycheck group by chknum)
STEP 3. If numbers 1 and 2 are a go, then run it.
delete from paycheck where [ids.id] in (select [ids.id] from paycheck where not exists (select chknum, max([ids.id]) from paycheck group by chknum))
0
jay-areAuthor Commented:
I altered your query a bit to this:

delete from paycheck where [ids.id] in (select [ids.id] from paycheck where not exists (select max([ids.id]) from paycheck group by [ids.id]))

0 rows affected.


0
Steve HoggITCommented:
Sorry, I'm back.
Can you write just a select statement, no delete, that select what you want to delete? It should be this simple:
SELECT * FROM PAYCHECK WHERE ...
This is Step 2. Confirm this first. Then post the select here and show how many rows it returns.
0
jay-areAuthor Commented:
The select query from step 2 doesn't return any results for some reason.  I'm trying to delete all the duplicate chknums or [ids.id]'s.  I guess the ID would be best to use.  So I'd like to keep the max([ids.id]) of each group'd id's and just delete the rest.
0
Steve HoggITCommented:
So in this case, you need a select to choose what you want to keep. Something like this:
select chknum, max([ids.id]) from paycheck group by chknum
This will give you each distinct chknum and the max id. Can you confirm this is what you want to keep?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jay-areAuthor Commented:
select chknum, max([ids.id]) as [ids.id] from paycheck group by [ids.id], chknum

Using that gets exactly what I need to keep.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.