Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Optimize Complex Query

Posted on 2006-11-05
8
Medium Priority
?
386 Views
Last Modified: 2010-05-18
I'll try to simplify this to only the relevant information and hope that I don't inadvertently omit something important.  (-:

I have...
1.  A table of payroll deductions.  Let's say its fields are (amount DECIMAL, person_id CHAR(9), deduction_date DATE).  This table is fairly big (probably around a million rows and growing).
2.  A table of EXPECTED deductions -- common amounts that we'd expect to see.  Let's say its only field is (amount DECIMAL).  This table is tiny (probably no more than 40 rows ever).

I wish to produce a report that...
1.  Includes only UNUSUAL deductions (i.e. amounts that do not appear in expected_deductions)
2.  Includes only items with a particular date.
3.  Is ordered by the TOTAL of all deductions for a particular person on that date.

For example, I might have...
Person A   $790
Person A     $30
Person B   $800
Person C   $500

Notice that Person A's deductions appear first because the TOTAL of 790 and 30 is greater than Person B's 800.  They're still listed as separate lines, however, so just saying "throw in a GROUP BY" isn't gonna cut it.  (-:


I've tried a bunch of different queries, but the best execution time I've gotten so far (around 20 seconds) was for this:

SELECT * FROM deductions
INNER JOIN (SELECT SUM(amount) total, BUID FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id) tblX
    ON (tblX.person_id = deductions.person_id)
LEFT JOIN expected_deductions ed ON (ed.amount = DOE.amount)
WHERE deduction_date = '2006-11-03' AND ed.amount IS NULL
ORDER BY tblX.total DESC, amount DESC;


Any query that obtains the result I want, exploiting any MySQL 5.0 features (including perhaps a new index that I've not yet tried) will do it.  I'll settle, if necessary, for a solution that does some work in PHP too, but at this point I'm being stubborn and I'm determined to get the DBMS to tell me what I want in under 5 seconds.  (-:
0
Comment
Question by:VoteyDisciple
  • 5
  • 3
8 Comments
 
LVL 14

Expert Comment

by:racek
ID: 17877030
Do you have INDEX on (deduction_date, person_id) ? - I mean ONE index over two columns
0
 
LVL 14

Accepted Solution

by:
racek earned 2000 total points
ID: 17877081
you query cannot works ... what is DOE?
I suppose that something like that with index on (deduction_date, person_id) can works

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT * FROM deductions d  
      JOIN         sums AS tblX                ON (tblX.person_id = d.person_id)
      LEFT JOIN expected_deductions ed ON (ed.amount = tblX.total)
WHERE d.deduction_date = '2006-11-03'
    AND ed.amount IS NULL
ORDER BY tblX.total DESC, d.amount DESC;
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877235
Whoops, DOE is what I'm here calling 'deductions'.  For that matter, BUID is what I'm here calling person_id.

I do have a wide variety of indices on this table including both (person_id, deduction_date) and one that ought to be perfect for that derived query: (person_id, deduction_date, amount).

According to EXPLAIN, MySQL chooses the index on (deduction_date) by itself for this query.  With a USE INDEX hint, however, performance only improves by a fraction of a second.

(What's baffling to me is why EXPLAIN reports that it'll have to scan about 40,000 rows with just the deduction_date, yet will have to scan nearly 300,000 with the larger index.  All the information it needs should be contained in that index; perhaps it is not as useful as I would have guessed.)
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877258
I've also tried moving the derived query to a temporary table; I assume your thinking was to get the ability to index over the temporary table?  This does gain some performance, but still not a whole lot.

In total, I've still brought the performance of the entire query down to about 13 seconds from 20.
0
 
LVL 14

Expert Comment

by:racek
ID: 17877264

LEFT JOIN expected_deductions ed ON (ed.amount = DOE.amount)
WHERE deduction_date = '2006-11-03' AND ed.amount IS NULL
ORDER BY tblX.total DESC, amount DESC;

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT * FROM deductions AS d  
                 JOIN  sums         AS tblX   ON  tblX.person_id = d.person_id
              WHERE  d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 14

Expert Comment

by:racek
ID: 17877266
sorry

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT * FROM deductions AS d  
                 JOIN  sums         AS tblX   ON  tblX.person_id = d.person_id
              WHERE  d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 14

Expert Comment

by:racek
ID: 17877285
if optimizer does not uses INDEX on (deduction_date, person_id), try:

create temporary table sums (person_id int not null primary key, total int not null default 0);
insert into sums select person_id , SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id;
SELECT a.*  FROM
(SELECT *
    FROM   deductions AS d,
                sums         AS tblX  
    WHERE  tblX.person_id    = d.person_id
         AND d.deduction_date = '2006-11-03' ) AS a
LEFT JOIN expected_deductions ed ON ed.amount = a.amount
WHERE ed.amount IS NULL
ORDER BY a.total DESC, a.amount DESC;
0
 
LVL 19

Author Comment

by:VoteyDisciple
ID: 17877795
The best performance I've been able to get is with this combination:

CREATE TEMPORARY TABLE sums (person_id INT UNSIGNED PRIMARY KEY, total INT) AS (SELECT person_id, SUM(amount) total FROM deductions WHERE payroll_date='2006-11-03' GROUP BY person_id);

SELECT *
FROM deductions
INNER JOIN sums ON (deductions.person_id = sums.person_id)
LEFT JOIN expected_deductions ed ON (deductions.amount = ed.amount)
WHERE check_date = '2006-11--03'  AND ed.amount IS NULL
ORDER BY sums.total DESC, deductions.amount DESC

Total time to run this combination (now exploiting the best available index at each step) is around 8.5 seconds.


Adding that derived table hurt much more than it helped in this case, but switching to a temporary table for the list of sums did gain some improvement.  


I think I'm gonna call this adequate.  Twenty seconds (counting PHP processing time) is still a long time to wait for a webpage to load, but I think the "real solution" will be for the client to come up with criteria for this report that narrow down the results a bit more.  (-:
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question