harpboy918
asked on
MYSQL / PHP delete duplicates, save most current, re-index?
OK - this seems like an easy one, but I've been trying to find the best approach.
I have a MYSQL dbase, which I'm quering mainly w/ php. it's contents are as follows
tablename = sales
ID DATE CENTR TOT_SALES
1 9/11/2006 0 1201.07
2 9/11/2006 1 1159.35
3 9/11/2006 6 41.72
4 9/12/2006 0 1293.62
5 9/12/2006 1 1191.56
6 9/12/2006 6 102.06
7 9/13/2006 0 1201.34
8 9/13/2006 1 1107.64
9 9/13/2006 6 93.70
10 9/14/2006 0 2261.69
11 9/14/2006 1 2130.50
12 9/14/2006 6 131.19
13 9/15/2006 0 1511.08
14 9/15/2006 1 1474.28
15 9/15/2006 6 36.80
16 9/16/2006 0 946.61
17 9/16/2006 1 908.36
18 9/16/2006 6 38.25
19 9/17/2006 0 1018.44
20 9/17/2006 1 972.07
21 9/17/2006 6 46.37
22 9/18/2006 0 1117.78
23 9/18/2006 1 1087.51
24 9/18/2006 6 30.27
25 9/19/2006 0 1213.94
26 9/19/2006 1 1081.18
27 9/19/2006 6 132.76
28 9/20/2006 0 1252.27
29 9/20/2006 1 1072.80
30 9/20/2006 6 179.47
31 9/14/2006 0 3961.73
32 9/14/2006 1 2630.54
33 9/14/2006 6 1331.19
34 9/12/2006 0 2042.30
35 9/12/2006 1 1532.20
36 9/12/2006 6 510.10
37 9/21/2006 0 1535.49
38 9/21/2006 1 1423.31
39 9/21/2006 6 112.18
I need to write a script that I wil execute as maybe a stored procedure that runs every day at a certain time in the morning that will delete the any duplicate entry accept for the ones with the greatest ID numbers.
As you can see the sales catagories are broken down into set CENTRS and are all linked to a particular day. The last three records represent the correct values for the day 9/14/06
Hopefully, upon the scripts execution the data would look like this
ID DATE CENTR TOT_SALES
1 9/11/2006 0 1201.07
2 9/11/2006 1 1159.35
3 9/11/2006 6 41.72
7 9/13/2006 0 1201.34
8 9/13/2006 1 1107.64
9 9/13/2006 6 93.70
13 9/15/2006 0 1511.08
14 9/15/2006 1 1474.28
15 9/15/2006 6 36.80
16 9/16/2006 0 946.61
17 9/16/2006 1 908.36
18 9/16/2006 6 38.25
19 9/17/2006 0 1018.44
20 9/17/2006 1 972.07
21 9/17/2006 6 46.37
22 9/18/2006 0 1117.78
23 9/18/2006 1 1087.51
24 9/18/2006 6 30.27
25 9/19/2006 0 1213.94
26 9/19/2006 1 1081.18
27 9/19/2006 6 132.76
28 9/20/2006 0 1252.27
29 9/20/2006 1 1072.80
30 9/20/2006 6 179.47
31 9/14/2006 0 3961.73
32 9/14/2006 1 2630.54
33 9/14/2006 6 1331.19
34 9/12/2006 0 2042.30
35 9/12/2006 1 1532.20
36 9/12/2006 6 510.10
37 9/21/2006 0 1535.49
38 9/21/2006 1 1423.31
39 9/21/2006 6 112.18
Or even better would be:
ID DATE CENTR TOT_SALES
1 9/11/2006 0 1201.07
2 9/11/2006 1 1159.35
3 9/11/2006 6 41.72
4 9/12/2006 0 2042.30
5 9/12/2006 1 1532.20
6 9/12/2006 6 510.10
7 9/13/2006 0 1201.34
8 9/13/2006 1 1107.64
9 9/13/2006 6 93.70
10 9/14/2006 0 3961.73
11 9/14/2006 1 2630.54
12 9/14/2006 6 1331.19
13 9/15/2006 0 1511.08
14 9/15/2006 1 1474.28
15 9/15/2006 6 36.80
16 9/16/2006 0 946.61
17 9/16/2006 1 908.36
18 9/16/2006 6 38.25
19 9/17/2006 0 1018.44
20 9/17/2006 1 972.07
21 9/17/2006 6 46.37
22 9/18/2006 0 1117.78
23 9/18/2006 1 1087.51
24 9/18/2006 6 30.27
25 9/19/2006 0 1213.94
26 9/19/2006 1 1081.18
27 9/19/2006 6 132.76
28 9/20/2006 0 1252.27
29 9/20/2006 1 1072.80
30 9/20/2006 6 179.47
31 9/21/2006 0 1535.49
32 9/21/2006 1 1423.31
33 9/21/2006 6 112.18
Any thoughts guys; thanks so much!
I have a MYSQL dbase, which I'm quering mainly w/ php. it's contents are as follows
tablename = sales
ID DATE CENTR TOT_SALES
1 9/11/2006 0 1201.07
2 9/11/2006 1 1159.35
3 9/11/2006 6 41.72
4 9/12/2006 0 1293.62
5 9/12/2006 1 1191.56
6 9/12/2006 6 102.06
7 9/13/2006 0 1201.34
8 9/13/2006 1 1107.64
9 9/13/2006 6 93.70
10 9/14/2006 0 2261.69
11 9/14/2006 1 2130.50
12 9/14/2006 6 131.19
13 9/15/2006 0 1511.08
14 9/15/2006 1 1474.28
15 9/15/2006 6 36.80
16 9/16/2006 0 946.61
17 9/16/2006 1 908.36
18 9/16/2006 6 38.25
19 9/17/2006 0 1018.44
20 9/17/2006 1 972.07
21 9/17/2006 6 46.37
22 9/18/2006 0 1117.78
23 9/18/2006 1 1087.51
24 9/18/2006 6 30.27
25 9/19/2006 0 1213.94
26 9/19/2006 1 1081.18
27 9/19/2006 6 132.76
28 9/20/2006 0 1252.27
29 9/20/2006 1 1072.80
30 9/20/2006 6 179.47
31 9/14/2006 0 3961.73
32 9/14/2006 1 2630.54
33 9/14/2006 6 1331.19
34 9/12/2006 0 2042.30
35 9/12/2006 1 1532.20
36 9/12/2006 6 510.10
37 9/21/2006 0 1535.49
38 9/21/2006 1 1423.31
39 9/21/2006 6 112.18
I need to write a script that I wil execute as maybe a stored procedure that runs every day at a certain time in the morning that will delete the any duplicate entry accept for the ones with the greatest ID numbers.
As you can see the sales catagories are broken down into set CENTRS and are all linked to a particular day. The last three records represent the correct values for the day 9/14/06
Hopefully, upon the scripts execution the data would look like this
ID DATE CENTR TOT_SALES
1 9/11/2006 0 1201.07
2 9/11/2006 1 1159.35
3 9/11/2006 6 41.72
7 9/13/2006 0 1201.34
8 9/13/2006 1 1107.64
9 9/13/2006 6 93.70
13 9/15/2006 0 1511.08
14 9/15/2006 1 1474.28
15 9/15/2006 6 36.80
16 9/16/2006 0 946.61
17 9/16/2006 1 908.36
18 9/16/2006 6 38.25
19 9/17/2006 0 1018.44
20 9/17/2006 1 972.07
21 9/17/2006 6 46.37
22 9/18/2006 0 1117.78
23 9/18/2006 1 1087.51
24 9/18/2006 6 30.27
25 9/19/2006 0 1213.94
26 9/19/2006 1 1081.18
27 9/19/2006 6 132.76
28 9/20/2006 0 1252.27
29 9/20/2006 1 1072.80
30 9/20/2006 6 179.47
31 9/14/2006 0 3961.73
32 9/14/2006 1 2630.54
33 9/14/2006 6 1331.19
34 9/12/2006 0 2042.30
35 9/12/2006 1 1532.20
36 9/12/2006 6 510.10
37 9/21/2006 0 1535.49
38 9/21/2006 1 1423.31
39 9/21/2006 6 112.18
Or even better would be:
ID DATE CENTR TOT_SALES
1 9/11/2006 0 1201.07
2 9/11/2006 1 1159.35
3 9/11/2006 6 41.72
4 9/12/2006 0 2042.30
5 9/12/2006 1 1532.20
6 9/12/2006 6 510.10
7 9/13/2006 0 1201.34
8 9/13/2006 1 1107.64
9 9/13/2006 6 93.70
10 9/14/2006 0 3961.73
11 9/14/2006 1 2630.54
12 9/14/2006 6 1331.19
13 9/15/2006 0 1511.08
14 9/15/2006 1 1474.28
15 9/15/2006 6 36.80
16 9/16/2006 0 946.61
17 9/16/2006 1 908.36
18 9/16/2006 6 38.25
19 9/17/2006 0 1018.44
20 9/17/2006 1 972.07
21 9/17/2006 6 46.37
22 9/18/2006 0 1117.78
23 9/18/2006 1 1087.51
24 9/18/2006 6 30.27
25 9/19/2006 0 1213.94
26 9/19/2006 1 1081.18
27 9/19/2006 6 132.76
28 9/20/2006 0 1252.27
29 9/20/2006 1 1072.80
30 9/20/2006 6 179.47
31 9/21/2006 0 1535.49
32 9/21/2006 1 1423.31
33 9/21/2006 6 112.18
Any thoughts guys; thanks so much!
It's usually done with a temporary table, select distinct from your actual table, insert into the temporary, truncate the actual, re-insert the new data from the temporary.
ASKER
Right I've been trying that, and using GROUP BY, any chance for some code snipplets?
ASKER
GROUP by count(*) etc i mean
ASKER
When the DISCINCT operator is used, what qualifies a record set as such, i mean, is the one with the greatest PK number? thanks guys
here is the steps which you can follow to achieve the result
First query to select records having count =1 and then these can directly inserted into a separate table
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) =1
Second query to select records having count >1
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) >1
In this results set you will get the entries which have count >1
From the above result set for each center and tmp_date fire the following query
SELECT id,center,tmp_date
FROM sales
WHERE tmp_date = '2007-03-25' /// this value will come from the above query results set
AND center =0 /// this value will come from the above query results set
ORDER BY id DESC
from this query just pick up the first id and then insert this ID into the separate table used above for the first query
I hope i have understood the problem and tried to attempt for a possible soloution
First query to select records having count =1 and then these can directly inserted into a separate table
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) =1
Second query to select records having count >1
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) >1
In this results set you will get the entries which have count >1
From the above result set for each center and tmp_date fire the following query
SELECT id,center,tmp_date
FROM sales
WHERE tmp_date = '2007-03-25' /// this value will come from the above query results set
AND center =0 /// this value will come from the above query results set
ORDER BY id DESC
from this query just pick up the first id and then insert this ID into the separate table used above for the first query
I hope i have understood the problem and tried to attempt for a possible soloution
ASKER
thanks for you reply, I think using SELECT DISTINCT might be a better way; shouldn't tmp_date's value be dynamic? thanks again!
yes i tried using distinct but could not come up with a soloution may be another expert might be able to help out
>>tmp_date's value be dynamic
yes it is dynamic
The queries which i had shown in my previous post is not the complete soloution it is just a sample,
you need to write a PLSQL procedure or a php script and then set a cron to make it run periodically
>>tmp_date's value be dynamic
yes it is dynamic
The queries which i had shown in my previous post is not the complete soloution it is just a sample,
you need to write a PLSQL procedure or a php script and then set a cron to make it run periodically
ASKER
thansk babunoi
ASKER
any one else have any suggestions? thanks!
ASKER
this is the query I'm using to get the values I need
SELECT DISTINCT date,max(tot_sales) FROM site_sales WHERE cost_centr=0 GROUP BY date, center
what' sthe differnetce between storing these values in a temp table or a table I just create and then delete upon script execution? thanks!
SELECT DISTINCT date,max(tot_sales) FROM site_sales WHERE cost_centr=0 GROUP BY date, center
what' sthe differnetce between storing these values in a temp table or a table I just create and then delete upon script execution? thanks!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
dunno about cron; i was thinking about just running a scheduled stored procedure is this the same thing?