Solved

Optimize Complex Query

Posted on 2006-11-05
8
381 Views
Last Modified: 2010-05-18
I'll try to simplify this to only the relevant information and hope that I don't inadvertently omit something important.  (-:

I have...
1.  A table of payroll deductions.  Let's say its fields are (amount DECIMAL, person_id CHAR(9), deduction_date DATE).  This table is fairly big (probably around a million rows and growing).
2.  A table of EXPECTED deductions -- common amounts that we'd expect to see.  Let's say its only field is (amount DECIMAL).  This table is tiny (probably no more than 40 rows ever).

I wish to produce a report that...
1.  Includes only UNUSUAL deductions (i.e. amounts that do not appear in expected_deductions)
2.  Includes only items with a particular date.
3.  Is ordered by the TOTAL of all deductions for a particular person on that date.

For example, I might have...
Person A   $790
Person A     $30
Person B   $800
Person C   $500

Notice that Person A's deductions appear first because the TOTAL of 790 and 30 is greater than Person B's 800.  They're still listed as separate lines, however, so just saying "throw in a GROUP BY" isn't gonna cut it.  (-:


I've tried a bunch of different queries, but the best execution time I've gotten so far (around 20 seconds) was for this:

SELECT * FROM deductions
INNER JOIN (SELECT SUM(amount) total, BUID FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id) tblX
    ON (tblX.person_id = deductions.person_id)
LEFT JOIN expected_deductions ed ON (ed.amount = DOE.amount)
WHERE deduction_date = '2006-11-03' AND ed.amount IS NULL
ORDER BY tblX.total DESC, amount DESC;


Any query that obtains the result I want, exploiting any MySQL 5.0 features (including perhaps a new index that I've not yet tried) will do it.  I'll settle, if necessary, for a solution that does some work in PHP too, but at this point I'm being stubborn and I'm determined to get the DBMS to tell me what I want in under 5 seconds.  (-:
0
Comment
Question by:VoteyDisciple
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 14

Expert Comment

by:racek
ID: 17877030
Do you have INDEX on (deduction_date, person_id) ? - I mean ONE index over two columns
0
 
LVL 14

Accepted Solution

by:
racek earned 500 total points
ID: 17877081
you query cannot works ... what is DOE?
I suppose that something like that with index on (deduction_date, person_id) can works

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT * FROM deductions d  
      JOIN         sums AS tblX                ON (tblX.person_id = d.person_id)
      LEFT JOIN expected_deductions ed ON (ed.amount = tblX.total)
WHERE d.deduction_date = '2006-11-03'
    AND ed.amount IS NULL
ORDER BY tblX.total DESC, d.amount DESC;
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877235
Whoops, DOE is what I'm here calling 'deductions'.  For that matter, BUID is what I'm here calling person_id.

I do have a wide variety of indices on this table including both (person_id, deduction_date) and one that ought to be perfect for that derived query: (person_id, deduction_date, amount).

According to EXPLAIN, MySQL chooses the index on (deduction_date) by itself for this query.  With a USE INDEX hint, however, performance only improves by a fraction of a second.

(What's baffling to me is why EXPLAIN reports that it'll have to scan about 40,000 rows with just the deduction_date, yet will have to scan nearly 300,000 with the larger index.  All the information it needs should be contained in that index; perhaps it is not as useful as I would have guessed.)
0
Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877258
I've also tried moving the derived query to a temporary table; I assume your thinking was to get the ability to index over the temporary table?  This does gain some performance, but still not a whole lot.

In total, I've still brought the performance of the entire query down to about 13 seconds from 20.
0
 
LVL 14

Expert Comment

by:racek
ID: 17877264

LEFT JOIN expected_deductions ed ON (ed.amount = DOE.amount)
WHERE deduction_date = '2006-11-03' AND ed.amount IS NULL
ORDER BY tblX.total DESC, amount DESC;

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT * FROM deductions AS d  
                 JOIN  sums         AS tblX   ON  tblX.person_id = d.person_id
              WHERE  d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 14

Expert Comment

by:racek
ID: 17877266
sorry

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT * FROM deductions AS d  
                 JOIN  sums         AS tblX   ON  tblX.person_id = d.person_id
              WHERE  d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 14

Expert Comment

by:racek
ID: 17877285
if optimizer does not uses INDEX on (deduction_date, person_id), try:

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT *
    FROM   deductions AS d,
                sums         AS tblX  
    WHERE  tblX.person_id    = d.person_id
         AND d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877795
The best performance I've been able to get is with this combination:

CREATE TEMPORARY TABLE sums (person_id INT UNSIGNED PRIMARY KEY, total INT) AS (SELECT person_id, SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id);

SELECT *
FROM deductions
INNER JOIN sums ON (deductions.person_id = sums.person_id)
LEFT JOIN expected_deductions ed ON (deductions.amount = ed.amount)
WHERE check_date = '2006-11--03'  AND ed.amount IS NULL
ORDER BY sums.total DESC, deductions.amount DESC

Total time to run this combination (now exploiting the best available index at each step) is around 8.5 seconds.


Adding that derived table hurt much more than it helped in this case, but switching to a temporary table for the list of sums did gain some improvement.  


I think I'm gonna call this adequate.  Twenty seconds (counting PHP processing time) is still a long time to wait for a webpage to load, but I think the "real solution" will be for the client to come up with criteria for this report that narrow down the results a bit more.  (-:
0

Featured Post

How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword This is an old article.  Instead of using the MySQL extension that was used in the original code examples, please choose one of the currently supported database extensions instead.  More information is available here: MySQLi / PDO (http://…
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question