Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Optimize Complex Query

Posted on 2006-11-05
8
Medium Priority
?
384 Views
Last Modified: 2010-05-18
I'll try to simplify this to only the relevant information and hope that I don't inadvertently omit something important.  (-:

I have...
1.  A table of payroll deductions.  Let's say its fields are (amount DECIMAL, person_id CHAR(9), deduction_date DATE).  This table is fairly big (probably around a million rows and growing).
2.  A table of EXPECTED deductions -- common amounts that we'd expect to see.  Let's say its only field is (amount DECIMAL).  This table is tiny (probably no more than 40 rows ever).

I wish to produce a report that...
1.  Includes only UNUSUAL deductions (i.e. amounts that do not appear in expected_deductions)
2.  Includes only items with a particular date.
3.  Is ordered by the TOTAL of all deductions for a particular person on that date.

For example, I might have...
Person A   $790
Person A     $30
Person B   $800
Person C   $500

Notice that Person A's deductions appear first because the TOTAL of 790 and 30 is greater than Person B's 800.  They're still listed as separate lines, however, so just saying "throw in a GROUP BY" isn't gonna cut it.  (-:


I've tried a bunch of different queries, but the best execution time I've gotten so far (around 20 seconds) was for this:

SELECT * FROM deductions
INNER JOIN (SELECT SUM(amount) total, BUID FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id) tblX
    ON (tblX.person_id = deductions.person_id)
LEFT JOIN expected_deductions ed ON (ed.amount = DOE.amount)
WHERE deduction_date = '2006-11-03' AND ed.amount IS NULL
ORDER BY tblX.total DESC, amount DESC;


Any query that obtains the result I want, exploiting any MySQL 5.0 features (including perhaps a new index that I've not yet tried) will do it.  I'll settle, if necessary, for a solution that does some work in PHP too, but at this point I'm being stubborn and I'm determined to get the DBMS to tell me what I want in under 5 seconds.  (-:
0
Comment
Question by:VoteyDisciple
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 14

Expert Comment

by:racek
ID: 17877030
Do you have INDEX on (deduction_date, person_id) ? - I mean ONE index over two columns
0
 
LVL 14

Accepted Solution

by:
racek earned 2000 total points
ID: 17877081
you query cannot works ... what is DOE?
I suppose that something like that with index on (deduction_date, person_id) can works

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT * FROM deductions d  
      JOIN         sums AS tblX                ON (tblX.person_id = d.person_id)
      LEFT JOIN expected_deductions ed ON (ed.amount = tblX.total)
WHERE d.deduction_date = '2006-11-03'
    AND ed.amount IS NULL
ORDER BY tblX.total DESC, d.amount DESC;
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877235
Whoops, DOE is what I'm here calling 'deductions'.  For that matter, BUID is what I'm here calling person_id.

I do have a wide variety of indices on this table including both (person_id, deduction_date) and one that ought to be perfect for that derived query: (person_id, deduction_date, amount).

According to EXPLAIN, MySQL chooses the index on (deduction_date) by itself for this query.  With a USE INDEX hint, however, performance only improves by a fraction of a second.

(What's baffling to me is why EXPLAIN reports that it'll have to scan about 40,000 rows with just the deduction_date, yet will have to scan nearly 300,000 with the larger index.  All the information it needs should be contained in that index; perhaps it is not as useful as I would have guessed.)
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877258
I've also tried moving the derived query to a temporary table; I assume your thinking was to get the ability to index over the temporary table?  This does gain some performance, but still not a whole lot.

In total, I've still brought the performance of the entire query down to about 13 seconds from 20.
0
 
LVL 14

Expert Comment

by:racek
ID: 17877264

LEFT JOIN expected_deductions ed ON (ed.amount = DOE.amount)
WHERE deduction_date = '2006-11-03' AND ed.amount IS NULL
ORDER BY tblX.total DESC, amount DESC;

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT * FROM deductions AS d  
                 JOIN  sums         AS tblX   ON  tblX.person_id = d.person_id
              WHERE  d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 14

Expert Comment

by:racek
ID: 17877266
sorry

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT * FROM deductions AS d  
                 JOIN  sums         AS tblX   ON  tblX.person_id = d.person_id
              WHERE  d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 14

Expert Comment

by:racek
ID: 17877285
if optimizer does not uses INDEX on (deduction_date, person_id), try:

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT *
    FROM   deductions AS d,
                sums         AS tblX  
    WHERE  tblX.person_id    = d.person_id
         AND d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877795
The best performance I've been able to get is with this combination:

CREATE TEMPORARY TABLE sums (person_id INT UNSIGNED PRIMARY KEY, total INT) AS (SELECT person_id, SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id);

SELECT *
FROM deductions
INNER JOIN sums ON (deductions.person_id = sums.person_id)
LEFT JOIN expected_deductions ed ON (deductions.amount = ed.amount)
WHERE check_date = '2006-11--03'  AND ed.amount IS NULL
ORDER BY sums.total DESC, deductions.amount DESC

Total time to run this combination (now exploiting the best available index at each step) is around 8.5 seconds.


Adding that derived table hurt much more than it helped in this case, but switching to a temporary table for the list of sums did gain some improvement.  


I think I'm gonna call this adequate.  Twenty seconds (counting PHP processing time) is still a long time to wait for a webpage to load, but I think the "real solution" will be for the client to come up with criteria for this report that narrow down the results a bit more.  (-:
0

Featured Post

Learn Veeam advantages over legacy backup

Every day, more and more legacy backup customers switch to Veeam. Technologies designed for the client-server era cannot restore any IT service running in the hybrid cloud within seconds. Learn top Veeam advantages over legacy backup and get Veeam for the price of your renewal

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This post contains step-by-step instructions for setting up alerting in Percona Monitoring and Management (PMM) using Grafana.
By, Vadim Tkachenko. In this article we’ll look at ClickHouse on its one year anniversary.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question