Link to home
Start Free TrialLog in
Avatar of harpboy918
harpboy918

asked on

MYSQL / PHP delete duplicates, save most current, re-index?

OK - this seems like an easy one, but I've been trying to find the best approach.

I have a MYSQL dbase, which I'm quering mainly w/ php.  it's contents are as follows

tablename = sales

ID      DATE      CENTR      TOT_SALES
1      9/11/2006      0      1201.07
2      9/11/2006      1      1159.35
3      9/11/2006      6      41.72
4      9/12/2006      0      1293.62
5      9/12/2006      1      1191.56
6      9/12/2006      6      102.06
7      9/13/2006      0      1201.34
8      9/13/2006      1      1107.64
9      9/13/2006      6      93.70
10      9/14/2006      0      2261.69
11      9/14/2006      1      2130.50
12      9/14/2006      6      131.19
13      9/15/2006      0      1511.08
14      9/15/2006      1      1474.28
15      9/15/2006      6      36.80
16      9/16/2006      0      946.61
17      9/16/2006      1      908.36
18      9/16/2006      6      38.25
19      9/17/2006      0      1018.44
20      9/17/2006      1      972.07
21      9/17/2006      6      46.37
22      9/18/2006      0      1117.78
23      9/18/2006      1      1087.51
24      9/18/2006      6      30.27
25      9/19/2006      0      1213.94
26      9/19/2006      1      1081.18
27      9/19/2006      6      132.76
28      9/20/2006      0      1252.27
29      9/20/2006      1      1072.80
30      9/20/2006      6      179.47
31      9/14/2006      0      3961.73
32      9/14/2006      1      2630.54
33      9/14/2006      6      1331.19
34      9/12/2006      0      2042.30
35      9/12/2006      1      1532.20
36      9/12/2006      6      510.10
37      9/21/2006      0      1535.49
38      9/21/2006      1      1423.31
39      9/21/2006      6      112.18


I need to write a script that I wil execute as maybe a stored procedure that runs every day at a certain time in the morning that will delete the any duplicate entry accept for the ones with the greatest ID numbers.

As you can see the sales catagories are broken down into set CENTRS and are all linked to a particular day.  The last three records represent the correct values for the day 9/14/06

Hopefully, upon the scripts execution the data would look like this

ID      DATE      CENTR      TOT_SALES
1      9/11/2006      0      1201.07
2      9/11/2006      1      1159.35
3      9/11/2006      6      41.72
7      9/13/2006      0      1201.34
8      9/13/2006      1      1107.64
9      9/13/2006      6      93.70
13      9/15/2006      0      1511.08
14      9/15/2006      1      1474.28
15      9/15/2006      6      36.80
16      9/16/2006      0      946.61
17      9/16/2006      1      908.36
18      9/16/2006      6      38.25
19      9/17/2006      0      1018.44
20      9/17/2006      1      972.07
21      9/17/2006      6      46.37
22      9/18/2006      0      1117.78
23      9/18/2006      1      1087.51
24      9/18/2006      6      30.27
25      9/19/2006      0      1213.94
26      9/19/2006      1      1081.18
27      9/19/2006      6      132.76
28      9/20/2006      0      1252.27
29      9/20/2006      1      1072.80
30      9/20/2006      6      179.47
31      9/14/2006      0      3961.73
32      9/14/2006      1      2630.54
33      9/14/2006      6      1331.19
34      9/12/2006      0      2042.30
35      9/12/2006      1      1532.20
36      9/12/2006      6      510.10
37      9/21/2006      0      1535.49
38      9/21/2006      1      1423.31
39      9/21/2006      6      112.18


Or even better would be:

ID      DATE      CENTR      TOT_SALES
1      9/11/2006      0      1201.07
2      9/11/2006      1      1159.35
3      9/11/2006      6      41.72
4      9/12/2006      0      2042.30
5      9/12/2006      1      1532.20
6      9/12/2006      6      510.10
7      9/13/2006      0      1201.34
8      9/13/2006      1      1107.64
9      9/13/2006      6      93.70
10      9/14/2006      0      3961.73
11      9/14/2006      1      2630.54
12      9/14/2006      6      1331.19
13      9/15/2006      0      1511.08
14      9/15/2006      1      1474.28
15      9/15/2006      6      36.80
16      9/16/2006      0      946.61
17      9/16/2006      1      908.36
18      9/16/2006      6      38.25
19      9/17/2006      0      1018.44
20      9/17/2006      1      972.07
21      9/17/2006      6      46.37
22      9/18/2006      0      1117.78
23      9/18/2006      1      1087.51
24      9/18/2006      6      30.27
25      9/19/2006      0      1213.94
26      9/19/2006      1      1081.18
27      9/19/2006      6      132.76
28      9/20/2006      0      1252.27
29      9/20/2006      1      1072.80
30      9/20/2006      6      179.47
31      9/21/2006      0      1535.49
32      9/21/2006      1      1423.31
33      9/21/2006      6      112.18


Any thoughts guys; thanks so much!
Avatar of gamebits
gamebits
Flag of Canada image

It's usually done with a temporary table, select distinct from your actual table, insert into the temporary, truncate the actual, re-insert the new data from the temporary.
Avatar of harpboy918
harpboy918

ASKER

Right I've been trying that, and using GROUP BY, any chance for some code snipplets?

GROUP by count(*) etc i mean
When the DISCINCT operator is used, what qualifies a record set as such, i mean, is the one with the greatest PK number? thanks guys
here is the steps which you can follow to achieve the result
First query to select records having count =1 and then these can directly inserted into a separate table
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) =1

Second query to select records having count >1
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) >1
In this results set you will get the entries which have count >1
From the above result set for each center and tmp_date fire the following  query
SELECT id,center,tmp_date
FROM sales
WHERE tmp_date = '2007-03-25'   /// this value will come from the above query results set
AND center =0                         /// this value will come from the above query results set
ORDER BY id DESC
from this query just pick up the first id and then insert this ID into the separate table used above for the first query

I hope i have understood the problem and tried to attempt for a possible soloution
thanks for you reply, I think using SELECT DISTINCT might be a better way; shouldn't tmp_date's value be dynamic? thanks again!
yes i tried using distinct but could not come up with a soloution may be another expert might be able to help out
>>tmp_date's value be dynamic
yes it is dynamic
The queries which i had shown in my previous post is not the complete soloution it is just a sample,
you need to write a PLSQL procedure or a php script and then set a cron to make it run periodically
thansk babunoi
any one else have any suggestions?  thanks!
this is the query I'm using to get the values I need
SELECT DISTINCT date,max(tot_sales) FROM site_sales WHERE cost_centr=0 GROUP BY date, center

what' sthe differnetce between storing these values in a temp table or a table I just create and then delete upon script execution? thanks!
ASKER CERTIFIED SOLUTION
Avatar of babuno5
babuno5
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
dunno about cron; i was thinking about just running a scheduled stored procedure is this the same thing?