• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 208
  • Last Modified:

best possible solution

if Im trying to delete duplicates by comparing two columns from two different tables and each table has too many rows to do a simple delete query what would be the best possible solution should I create a function like


rs.open"select * from table"
rs2.open"select distinct column1 from table"

do until rs.eof
if rs.column1 = rs2.colunmn then
delete.addcurrent
rs.movenext
end if
loop
0
thenone
Asked:
thenone
  • 11
  • 6
  • 2
  • +1
1 Solution
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
if you have restraints set up it might be difficult to just delete rows in a table.

Follow this link for instruction on how to properly remove duplicates

http://weblogs.sqlteam.com/jeffs/archive/2004/10/07/2190.aspx


Leo
0
 
thenoneAuthor Commented:
im not using sql im using mysql
0
 
thenoneAuthor Commented:
also I have tried just a straight sql statement but the tables are too big for my cpu to cope with the sql statement.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
Well if you can't use SQL functions because of your cpu, then it's going to be quite difficult to get all the entries from one table and see if they exist in another table.

If you wanted to delete all the duplicates from a single table, in mysql you can do

ALTER IGNORE TABLE tableName ADD UNIQUE INDEX(a,b);

I'll can try to do some more digging, but I'm not sure how you are going to get all the entires from table A and comapre to table B without using SQL SELECT statements.

Leo
0
 
thenoneAuthor Commented:
I can still use sql select statements but when I tried to do lets say delete table from table1.column where union table2.column = table1.column

my cpu goes into a tales spin becasue there are too many records to go thru.
0
 
thenoneAuthor Commented:
not sure if I exactly wrote that correctly.but in either case it still throws my processor into a tales spin.
0
 
Arthur_WoodCommented:
how big are your tables?  I find it hard to believe that they could be 'too big for my cpu to cope with the SQL statement' (not that I really understand what that statment means, in the first place)

How about

SQL = "DELETE FROM TABLE2 where Table2.Column1 in (Select Column1 from Table1)"

AW
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
Well in that case you can do paging to select only a set number of records at a time.

SELECT TOP 200 *
FROM YourTableName
WHERE YourCondition

Try to do this and then run a query that failed before.

Leo
0
 
thenoneAuthor Commented:
Ive tried this and still recieved too much processing from my cpu.

"DELETE FROM TABLE2 where Table2.Column1 in (Select Column1 from Table1)"


one table I have almost 200,000 records and the other is about the same.
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
did you try my above comment?

Try this to see if it is actually a cpu issue

"DELETE FROM Table2 WHERE Table2.Column1 IN(
SELECT TOP 200 FROM Table1)"


Leo
0
 
thenoneAuthor Commented:
I will give that a try but is it possible that there are too many records for the query?
0
 
thenoneAuthor Commented:
also if that is the case how can I do all of the records not just the top 200?
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
it gets kinda complicated.

so, if i want records 120 to 150, i"l do this:
DELETE FROM Table2 WHERE Table2.cloumn1 IN
(SELECT TOP 150 * FROM Table1 ORDER BY someOrderField) L
LEFT JOIN
(SELECT TOP 119 * FROM Table1 ORDER BY someOrderField) R
ON L.id = R.id
WHERE R.id IS NULL
ORDER BY L.someOrderField

Here you want to make sure that in your first and second select they are ordered by the same variable.

I haven't actually tried this before but it should work.

Leo
0
 
thenoneAuthor Commented:
so you have never ran into a problem where you couldn't delete records with a sql statenment becasue there was too many records to delete and it would cause your computer to lock up?
0
 
Leo EikelmanDirector, IT and Business DevelopmentCommented:
personally no.  Most of the computers I've worked with have 4GB of memory.  But, have you tried my suuggestion up top to see if your computer will process that?

Leo
0
 
thenoneAuthor Commented:
I wonder if it could be my default settings for mysql.My server that I am running it on is a p4 with 512mb of memory it should be able to handle that query.If not couldn't I just loop through each record one at a time?
0
 
PSSUserCommented:
The statement
"DELETE FROM TABLE2 where Table2.Column1 in (Select Column1 from Table1)"
doesn't make effecient use of indexes. A better statement would be:
DELETE FROM TABLE2 WHERE EXISTS (SELECT Column1 FROM Table1 WHERE Table1.Column1=Table2.Column1)
provided you have indexes on Column1 in both tables 1 and 2.

Otherwise if you wish to do this in code the best way would be:

rs.open"select distinct column1 from table2"

do until rs2.eof
  Con.Execute "DELETE FROM Table1 WHERE Column1='" & rs.Column1.value & "'"
  rs.movenext
loop

Where Con is an ADO Connection object.
0
 
thenoneAuthor Commented:
so basically what you are saying is,is the reason why my cpu is over killing is because I do not have indexes set up for those columns?
0
 
PSSUserCommented:
A lack of indexes certainly won't help as the server will have to do table scans.

It may be that you will still have a problem with the tables indexed, but far less likely.

Thanks for the points
0
 
thenoneAuthor Commented:
I figured it out I changed my columns from blob text to varchar
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 11
  • 6
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now