Solved

MYSQL / PHP delete duplicates, save most current, re-index?

Posted on 2007-03-23
12
177 Views
Last Modified: 2008-01-09
OK - this seems like an easy one, but I've been trying to find the best approach.

I have a MYSQL dbase, which I'm quering mainly w/ php.  it's contents are as follows

tablename = sales

ID      DATE      CENTR      TOT_SALES
1      9/11/2006      0      1201.07
2      9/11/2006      1      1159.35
3      9/11/2006      6      41.72
4      9/12/2006      0      1293.62
5      9/12/2006      1      1191.56
6      9/12/2006      6      102.06
7      9/13/2006      0      1201.34
8      9/13/2006      1      1107.64
9      9/13/2006      6      93.70
10      9/14/2006      0      2261.69
11      9/14/2006      1      2130.50
12      9/14/2006      6      131.19
13      9/15/2006      0      1511.08
14      9/15/2006      1      1474.28
15      9/15/2006      6      36.80
16      9/16/2006      0      946.61
17      9/16/2006      1      908.36
18      9/16/2006      6      38.25
19      9/17/2006      0      1018.44
20      9/17/2006      1      972.07
21      9/17/2006      6      46.37
22      9/18/2006      0      1117.78
23      9/18/2006      1      1087.51
24      9/18/2006      6      30.27
25      9/19/2006      0      1213.94
26      9/19/2006      1      1081.18
27      9/19/2006      6      132.76
28      9/20/2006      0      1252.27
29      9/20/2006      1      1072.80
30      9/20/2006      6      179.47
31      9/14/2006      0      3961.73
32      9/14/2006      1      2630.54
33      9/14/2006      6      1331.19
34      9/12/2006      0      2042.30
35      9/12/2006      1      1532.20
36      9/12/2006      6      510.10
37      9/21/2006      0      1535.49
38      9/21/2006      1      1423.31
39      9/21/2006      6      112.18


I need to write a script that I wil execute as maybe a stored procedure that runs every day at a certain time in the morning that will delete the any duplicate entry accept for the ones with the greatest ID numbers.

As you can see the sales catagories are broken down into set CENTRS and are all linked to a particular day.  The last three records represent the correct values for the day 9/14/06

Hopefully, upon the scripts execution the data would look like this

ID      DATE      CENTR      TOT_SALES
1      9/11/2006      0      1201.07
2      9/11/2006      1      1159.35
3      9/11/2006      6      41.72
7      9/13/2006      0      1201.34
8      9/13/2006      1      1107.64
9      9/13/2006      6      93.70
13      9/15/2006      0      1511.08
14      9/15/2006      1      1474.28
15      9/15/2006      6      36.80
16      9/16/2006      0      946.61
17      9/16/2006      1      908.36
18      9/16/2006      6      38.25
19      9/17/2006      0      1018.44
20      9/17/2006      1      972.07
21      9/17/2006      6      46.37
22      9/18/2006      0      1117.78
23      9/18/2006      1      1087.51
24      9/18/2006      6      30.27
25      9/19/2006      0      1213.94
26      9/19/2006      1      1081.18
27      9/19/2006      6      132.76
28      9/20/2006      0      1252.27
29      9/20/2006      1      1072.80
30      9/20/2006      6      179.47
31      9/14/2006      0      3961.73
32      9/14/2006      1      2630.54
33      9/14/2006      6      1331.19
34      9/12/2006      0      2042.30
35      9/12/2006      1      1532.20
36      9/12/2006      6      510.10
37      9/21/2006      0      1535.49
38      9/21/2006      1      1423.31
39      9/21/2006      6      112.18


Or even better would be:

ID      DATE      CENTR      TOT_SALES
1      9/11/2006      0      1201.07
2      9/11/2006      1      1159.35
3      9/11/2006      6      41.72
4      9/12/2006      0      2042.30
5      9/12/2006      1      1532.20
6      9/12/2006      6      510.10
7      9/13/2006      0      1201.34
8      9/13/2006      1      1107.64
9      9/13/2006      6      93.70
10      9/14/2006      0      3961.73
11      9/14/2006      1      2630.54
12      9/14/2006      6      1331.19
13      9/15/2006      0      1511.08
14      9/15/2006      1      1474.28
15      9/15/2006      6      36.80
16      9/16/2006      0      946.61
17      9/16/2006      1      908.36
18      9/16/2006      6      38.25
19      9/17/2006      0      1018.44
20      9/17/2006      1      972.07
21      9/17/2006      6      46.37
22      9/18/2006      0      1117.78
23      9/18/2006      1      1087.51
24      9/18/2006      6      30.27
25      9/19/2006      0      1213.94
26      9/19/2006      1      1081.18
27      9/19/2006      6      132.76
28      9/20/2006      0      1252.27
29      9/20/2006      1      1072.80
30      9/20/2006      6      179.47
31      9/21/2006      0      1535.49
32      9/21/2006      1      1423.31
33      9/21/2006      6      112.18


Any thoughts guys; thanks so much!
0
Comment
Question by:harpboy918
  • 8
  • 3
12 Comments
 
LVL 28

Expert Comment

by:gamebits
ID: 18783056
It's usually done with a temporary table, select distinct from your actual table, insert into the temporary, truncate the actual, re-insert the new data from the temporary.
0
 

Author Comment

by:harpboy918
ID: 18783103
Right I've been trying that, and using GROUP BY, any chance for some code snipplets?

0
 

Author Comment

by:harpboy918
ID: 18783113
GROUP by count(*) etc i mean
0
 

Author Comment

by:harpboy918
ID: 18785515
When the DISCINCT operator is used, what qualifies a record set as such, i mean, is the one with the greatest PK number? thanks guys
0
 
LVL 15

Expert Comment

by:babuno5
ID: 18785703
here is the steps which you can follow to achieve the result
First query to select records having count =1 and then these can directly inserted into a separate table
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) =1

Second query to select records having count >1
SELECT count( * ) , center, tmp_date
FROM `sales`
GROUP BY center, tmp_date
HAVING count( * ) >1
In this results set you will get the entries which have count >1
From the above result set for each center and tmp_date fire the following  query
SELECT id,center,tmp_date
FROM sales
WHERE tmp_date = '2007-03-25'   /// this value will come from the above query results set
AND center =0                         /// this value will come from the above query results set
ORDER BY id DESC
from this query just pick up the first id and then insert this ID into the separate table used above for the first query

I hope i have understood the problem and tried to attempt for a possible soloution
0
 

Author Comment

by:harpboy918
ID: 18785822
thanks for you reply, I think using SELECT DISTINCT might be a better way; shouldn't tmp_date's value be dynamic? thanks again!
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 15

Expert Comment

by:babuno5
ID: 18787586
yes i tried using distinct but could not come up with a soloution may be another expert might be able to help out
>>tmp_date's value be dynamic
yes it is dynamic
The queries which i had shown in my previous post is not the complete soloution it is just a sample,
you need to write a PLSQL procedure or a php script and then set a cron to make it run periodically
0
 

Author Comment

by:harpboy918
ID: 18788052
thansk babunoi
0
 

Author Comment

by:harpboy918
ID: 18795346
any one else have any suggestions?  thanks!
0
 

Author Comment

by:harpboy918
ID: 18804377
this is the query I'm using to get the values I need
SELECT DISTINCT date,max(tot_sales) FROM site_sales WHERE cost_centr=0 GROUP BY date, center

what' sthe differnetce between storing these values in a temp table or a table I just create and then delete upon script execution? thanks!
0
 
LVL 15

Accepted Solution

by:
babuno5 earned 500 total points
ID: 18809964
whenever you run the cron you need to delete the previous entries and then the new entries will be inserted
say the results you store in site_sales1 table
so your final query that will enter the dat in site_sales1 will be

INSERT INTO site_table1 (date, tot_sales)
SELECT DISTINCT date,max(tot_sales)
FROM site_sales
GROUP BY date, center
0
 

Author Comment

by:harpboy918
ID: 18812731
dunno about cron; i was thinking about just running a scheduled stored procedure is this the same thing?
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL (http://www.experts-exchange.com/articles/201/Handling-Date-and-Time-in-PHP-and-MySQL.html) several years ago, it seemed like now was a good time to updat…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now