Solved

Combining two distinct counts on single large table into one query.

Posted on 2008-10-12
12
532 Views
Last Modified: 2012-05-05
On a large-ish table in testing of around 4 Million rows.
I have two similar counts
Firstly:
SELECT SUPPLIER, PRODUCT, COUNT(*)
FROM RECORDS WHERE INVDATE=20080810 AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT;

Secondly:
SELECT SUPPLIER, PRODUCT, COUNT(*)
FROM RECORDS WHERE AMENDED='Y' AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT;

Both counts are contained within indexes (but different ones) on the table.

I've used:
SELECT SUPPLIER, PRODUCT
,sum(case when INVDATE=20080810 THEN 1 else 0 end) num_inv
,sum(case when AMENDED='Y'  THEN 1 else 0 end) num_amended
FROM RECORDS WHERE AMENDED='Y' AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT;

This does give me what I want (albiet with lots of empty counts), but it takes several seconds, as from EXPLAIN it would appear no possible keys exist.  Using a union joined the results but as a single total column not as two individual ones.

Is there a more effecient to do a single query for this, and with the ability to extend the nubmer of different counts.

I have considered using a summary table, however with 100,000 INSERTS / UPDATES per day on the table I'm not sure that would be very effecient. (Still very new to MySQL).

Incidently is there much difference between COUNT(*) and SUM(1)?

Thanks in advance.




0
Comment
Question by:Azaniah
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
12 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 22697010
what about this:
SELECT SUPPLIER, PRODUCT
,sum(case when INVDATE=20080810 THEN 1 else 0 end) num_inv
,sum(case when AMENDED='Y'  THEN 1 else 0 end) num_amended
FROM RECORDS WHERE TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT;

Open in new window

0
 

Author Comment

by:Azaniah
ID: 22697037
Angelll unfortunately not, as the date search (num_inv) is the number of invoices received 'as of' or on the date. The amended status is an overall view across all dates.

Essentially the client would need to see a list of supplier+products, invoice received to date, and those that have been amended at any point as a seperate count.

0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 22697054
>The amended status is an overall view across all dates.
I have to disagree!

anyhow, you might add this:
SELECT SUPPLIER, PRODUCT
,sum(case when INVDATE=20080810 THEN 1 else 0 end) num_inv
,sum(case when AMENDED='Y'  THEN 1 else 0 end) num_amended
FROM RECORDS WHERE TYPE='I' AND CLIENT='BOB'
AND ( INVDATE=20080810 OR  AMENDED='Y' )
GROUP BY SUPPLIER, PRODUCT;

Open in new window

0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 

Author Comment

by:Azaniah
ID: 22697081
It's a bit faster 10 seconds vs 15, however the two seperate counts are 0.01secs so clearly no-where as effecient. Explain is at least listing possible keys, if not using any.

The number of possible rows returned for either set will be quite small, 40 max per count, so it might be more sensible to do the two counts and join the records later.

I might be mis-understanding something, but when you say you don't agree, if an client logs and and the two most important counts they need are new invoices (i.e. todays) and the total number of invoices that have been amended (so they can be addressed) I am not sure why you don't agree.

0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 22698400
I don't agree, because the case expression for the SUM() will "filter/count" as you requested, not having the condition in the where clause does not matter for that. in other terms, the query I suggested will return the same data/counts as your initial query.

now, the matter is performance.
what indexes do you have, actually, on the table?

if you don't have any index with CLIENT and TYPE fields being listed first, try to reorder the index columns in the index so that CLIENT is first, and TYPE second (and INVDATE eventually third)...

otherwise, you might use this syntax (assuming MySQL 5.x or higher)


SELECT COALESCE(l1.SUPPLIER, l2.SUPPLIER) SUPPLIER
, COALESCE(l1.PRODUCT, l2.PRODUCT) PRODUCT
, count_invdate
, count_amended
FROM (
SELECT SUPPLIER, PRODUCT, COUNT(*) count_invdate
FROM RECORDS WHERE INVDATE=20080810 AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT
) l1
FULL OUTER JOIN (
SELECT SUPPLIER, PRODUCT, COUNT(*) amended
FROM RECORDS WHERE AMENDED='Y' AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT
) l2
 ON l1.SUPPLIER = l2.SUPPLIER
AND l1.PRODUCT = l2.PRODUCT

Open in new window

0
 

Author Comment

by:Azaniah
ID: 22701189
Angellll,

Looks like the version I am testing with will not allow this JOIN (Community Server 5.0.51b), however I can see what the logic is so I will look at updating the test version of MySQL.

Thanks again for you help.
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 22701286
well, the FULL OUTER JOIN is maybe not even needed in your case, if all product/supplier are present in both subqueries?---
0
 

Author Comment

by:Azaniah
ID: 22701352
They are not. I did try looking at doing a LEFT and RIGHT join and doing a UNION - but my feeble brain refused to cope today.

0
 

Author Comment

by:Azaniah
ID: 22701484
The following seems to work, but I am not sure if it has any impacts that I might be missing

SELECT SUPPLIER, PRODUCT, COUNT(*) count_invdate, SUM(0) count_amended
FROM INVOICES WHERE INVDATE=20080420 AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT
UNION
SELECT SUPPLIER, PRODUCT, SUM(0) count_invdate, COUNT(*) count_amended
FROM INVOICES WHERE AMENDED='Y' AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT

Open in new window

0
 

Author Comment

by:Azaniah
ID: 22701697
Annoyingly that doesn't work correctly. Returns all the correct data but not joined as needed.

0
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 250 total points
ID: 22701839
ok:
SELECT SUPPLIER, PRODUCT, SUM(count_invdate), SUM(count_amended)
FROM (
SELECT SUPPLIER, PRODUCT, COUNT(*) count_invdate, SUM(0) count_amended
FROM INVOICES WHERE INVDATE=20080420 AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT
UNION ALL
SELECT SUPPLIER, PRODUCT, SUM(0) count_invdate, COUNT(*) count_amended
FROM INVOICES WHERE AMENDED='Y' AND TYPE='I' AND CLIENT='BOB'
GROUP BY SUPPLIER, PRODUCT
) l
GROUP BY SUPPLIER, PRODUCT

Open in new window

0
 

Author Comment

by:Azaniah
ID: 22702303
That works correctly and is much faster,certainly appears to have resolved my issue.
Thanks for your help again Angelll
0

Featured Post

Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL several years ago, it seemed like now was a good time to update it for object-oriented PHP.  This article does that, replacing as much as possible the pr…
When table data gets too large to manage or queries take too long to execute the solution is often to buy bigger hardware or assign more CPUs and memory resources to the machine to solve the problem. However, the best, cheapest and most effective so…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question