Solved

MySQL "Order By" clause is making the query very slow...

Posted on 2009-07-03
19
613 Views
Last Modified: 2012-05-07
Hi Experts,

I have a very simple query and it executes in 0.0043 seconds WITHOUT the "ORDER BY" clause.

When I introduce the ORDER BY clause (as seen below) the query suddenly becomes over 5 seconds.

I can't index the order by column because it is a dynamic column in a temporary, temporary table that is created on-the-fly by MySQL at the time of execution.

So why is it so slow? And what can I do to speed it up.

With ORDER BY clause: 5.0443 seconds
Without ORDER BY clause: 0.0043 seconds




SELECT

	i.iditems,

	MIN(d.price) AS price,

	MAX(d.price)AS pricefrom,

	COUNT(d.iddeals) AS deals,

	i.name,

	i.summary

FROM

	items AS i

INNER JOIN

	deals AS d 

	ON d.iditems = i.iditems 

GROUP BY

	d.iditems

ORDER BY 

	deals DESC

LIMIT

	10;

Open in new window

0
Comment
Question by:averasolutions
  • 9
  • 7
19 Comments
 
LVL 29

Expert Comment

by:fibo
ID: 24776473
Not sure, but I would think that the problem does not really come from the order, but of its combination with LIMIT:
- with limit without order, you proably get the first 10 records that happen to be here
- with limit and order, you first sort the N records, then take the 10
This implies that it handles N records and sorts them in xlog(N) time, while in the other case it handles 10 records and does not sort them

Check http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24776512
>I can't index the order by column because it is a dynamic column in a temporary, temporary table that is created on-the-fly by MySQL at the time of execution.

from the query I see, there seems to be nothing dynamic in the tables, I presume it's the query is dynamic?
in that case, you could still have (and keep) the index on that column.

you will need to check out the explain plan, and also check out the mysql system parameters about memory usage for key data...
0
 

Author Comment

by:averasolutions
ID: 24811826
@angelIII,

No - what I mean is that the "count" is not a column in a specific table that I can just index, it's created with this line:

[code]
COUNT(d.iddeals) AS deals
[/code]
0
 
LVL 29

Expert Comment

by:fibo
ID: 24812193
AveraSolution,

Have you had the opportunity to check my information? Just to rephrase it: as soon as you use "ORDER" a sort is needed, while it was not before.

So with a limit of 10 and 1000 elements:
LIMIT 0,10  this delivers you the first "physical" 10 elements in the table, whichever values their attributes are

ORDER BY .. LIMIT 0,10  needs first to sort the 1000 elements (this is at best some multiple of n.log(n), but might go to n^3), then delivers the 10 results: the time will be close to getting all the results, because the sort will be the more time-consuming task
Note: the algorithms generally used to sort are very clever... but most of the time they have no easy way to find the "first 10" results. Maybe however in that case they are using some (lots slower in other circumstances) algorithm which is more effective in delivering the first (or last) 10 records.
0
 

Author Comment

by:averasolutions
ID: 24812331
@fibo

I was actually looking at the link you provided and was going to come back to you later.

So - in short are you suggesting that, possibly, it may be a better idea to pull out all million matching rows and use PHP to eliminate the remaining rows that are un-needed?

That would be too memory intensive as the items table has over 2 million rows and the deals table has over 4.5 million rows.

Below is the full query that is slow, the one above was the cut down version that i was stripping down piece by piece to find where the slow-ness was occuring:

I have also included the explain result:
SELECT

	i.iditems AS iditems,

	i.name AS name,

	COUNT(d.iddeals) AS deals,

	MIN(d.price) AS pricefrom,

	IFNULL(r.retailername, '') AS retailername,

	IFNULL(m.manuname, '') AS manuname

FROM

	items i

INNER JOIN

	(deals d CROSS JOIN retailers r)

	ON ( (d.iditems = i.iditems AND d.idcurrencies = 1 AND d.idlanguages = 1) AND (d.idretailers = r.idretailers AND r.active = 1) )

INNER JOIN

	(categories c CROSS JOIN categories nodes)

	ON ( (nodes.lft BETWEEN c.lft AND c.rgt) AND nodes.idcategories = i.idcategories )

LEFT JOIN

	manufacturers m

	ON m.idmanufacturers = i.idmanufacturers

WHERE

	c.idcategories = '1043'

GROUP BY

	i.iditems

ORDER BY

	deals DESC, pricefrom ASC

LIMIT

	100;
 

----------------------
 

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra

1	SIMPLE	c	const	PRIMARY,lft	PRIMARY	4	const	1	Using temporary; Using filesort

1	SIMPLE	d	ref	retailer_sku,iditems,idcurrencies,idlanguages,idit-idcur-idlang,idit-idcur-idlang-idret	idlanguages	4	const	49754	Using where

1	SIMPLE	r	eq_ref	PRIMARY,active,idretailers-active	PRIMARY	4	price_db.d.idretailers	1	Using where

1	SIMPLE	i	eq_ref	PRIMARY,idcategories,select,idit-idm,idit-idc-idm	PRIMARY	4	price_db.d.iditems	1	

1	SIMPLE	nodes	eq_ref	PRIMARY,lft	PRIMARY	4	price_db.i.idcategories	1	Using where

1	SIMPLE	m	eq_ref	PRIMARY	PRIMARY	4	price_db.i.idmanufacturers	1	

Open in new window

0
 
LVL 29

Expert Comment

by:fibo
ID: 24814397
no, there is no way you could efficiently handle the millions records in php.

so iit has to be done with sql...
Mhhh.. how many records with c.idcategories = '1043' ? is there an index on idcategories?
 
0
 

Author Comment

by:averasolutions
ID: 24814554
Hi Again,

Within that category (and it's children cats) we have 148530 items with goodness knows how many deals attached to them.

I have made sure that every time a table is joined, an index is created on the columns and if a table is joined to many tables that an index is created on all the join columns acumulatively.

In theory - every column used in the ebove query has an index.
0
 
LVL 29

Accepted Solution

by:
fibo earned 125 total points
ID: 24822509
B-(
So I'm afraid you have reached the sql limits.


Some non-sql solutions you might consider to ease things exploring other avenues:
- check with your hosting provider which is the maximum ram-size you are allowed to use for sql server. can it be increased? at which price? are you allowed to make some test so that you can balance cost and performance
- depending on your application, you might consider to compute these queries in a cron job during night hours, and to place these in some kin of cache (ie: precompute your top five queries + any query that was called the day before)

And if the queries are some type of management report, you might as well run them at night and deliver results by mail...
all this would lower the load during day hours...
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:averasolutions
ID: 24824646
Hi,

Yeah - I was starting to think that I had pretty much hit the limits od MySQL.

The site is actually an online catalogue believe it or not. I am probably going to do as you have suggested an make a "de-normalised" table to query these results.

The problem is, that method is sloppy and I wanted to stay well away from that option.
0
 
LVL 29

Expert Comment

by:fibo
ID: 24827514
Well, you have in fact sort of
- left the land of operational systems relying heavily on SQL and normalized tables, to extract a few records out of millions
- and are nearing the land of decision support systems where hypercubes and denormalization are needed to get a correct performance for managing (hundreds of) thousands of records out of the same millions

However, if you "are" a catalog, there are probably other ways to get your answer.

I would suggest you re-start from scratch, forget SQL, and assume each catalog object is one row/ one record. Just write sentences in plain English, and work with "normal words". UML people would (more or less) call that "use cases".
Try, now only, to build your tables.
There are probably some logical shortcuts that would help you?

I am working on a similar problem for a catalog: I wantr to display the minimal price in each category. First shot is to compute on the fly and live this value... fine for the prototype with a few dozens product and no visitors... but it will not upscale well. So I am devising a cron job which will recompute these minima once a day... and now my values will be in a single table.

Sure, if there is a massive price update and I do nothing, most values will be incorrect... but then I just have to launch the "batch" job.
And the rest off the time, only a few categories, if any, would have a wrong minimum.
0
 

Author Comment

by:averasolutions
ID: 24827693
Hi,

Thanks for your assistance in this matter.

The catalogue works fine for our call centre as they don't mind waiting 2-3 seconds for a result. However, to be honest - we are looking at making it a website and those kind of time scales are just unrealistic.

I have never had any dealings with Oracle. - If I looked at Oracle, would this be a better option for this project.

How difficult is Oracle to learn?
Is it based on standard SQL?

I'm not sure you may be able to answer these questions, but I think that now we have established that this project is beyond the limits of SQL, maybe a different database solution is a better bet.

Thanks.
0
 
LVL 29

Expert Comment

by:fibo
ID: 24827939
Be VERY careful with Oracle licence  terms on a web-connected machine, specifically check how they upscale with traffic (and a xritten confirmation)

Consider exploring the track of a more powerful server with lots of RAM.

Make also some tests on Postgres.

Re-consider the cache idea
0
 
LVL 29

Expert Comment

by:fibo
ID: 24894784
Averasolutions,
Your problem has received answers and suggestions for improvement since your problem is out of the limits of "normal" use. Discovering that your problem has none of the solutions you were hoping for IS AN ANSWER to your problem, even if it is unpleasant.

I believe you cannot cancel / delete the question:
- it is important that people with a similar problem find this discussion and so will be able no to start again from scratch. Maybe for them some of the tracks suggested would be helpful.
- you have received answer and suggestions from AngelIII, which is certainly an authority here as far as MySQL is concerned.
- I think I have also contributed to explain the source of the problem you were experiencing, as well as suggesting you some directions you might explore.

Please reread the whole thread.
Kind regards,
B.
0
 

Author Comment

by:averasolutions
ID: 24894884
I did not in any way suggest my question was not answered. I am simply saying that this thread has not been very useful to us.

We have found the solution to the problem and it was in my.cnf memory allocation and disk access time which have all been improved and the query now runs at aveage 0.0226 seconds.

I understand that your guys tried your best, however, I was told that what I was trying to achieve was outside the scope of MySQL when it is not. Therefore I have asked for this to be removed as I believe it may be a little misleading to anyone who may read it.
0
 
LVL 29

Expert Comment

by:fibo
ID: 24895234
Sorry, but the solution you found is actually one that was suggested and to which you made no comment:

"- check with your hosting provider which is the maximum ram-size you are allowed to use for sql server. can it be increased? at which price? are you allowed to make some test so that you can balance cost and performance"

was pointing you in the right direction, and you were later reminded

"Consider exploring the track of a more powerful server with lots of RAM."

And before your most recent answer, you did not mention that you had found a solution and which it was.

This solution merits to stay in the knowledge base here
0
 

Author Comment

by:averasolutions
ID: 24895961
To be completely honest, I thought I was doing the right thing asking for this to be removed.

I didn't realise I would upset you or offend you by doing so.
0
 
LVL 29

Expert Comment

by:fibo
ID: 24898081
B-))
Thats why moderators are so useful, bringing us calm and peace.
Thxs averasolutions, thx vee_mod!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

A lot of articles have been written on splitting mysqldump and grabbing the required tables. A long while back, when Shlomi (http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump) had suggested a “sed” way, I actually shell …
Introduction In this article, I will by showing a nice little trick for MySQL similar to that of my previous EE Article for SQLite (http://www.sqlite.org/), A SQLite Tidbit: Quick Numbers Table Generation (http://www.experts-exchange.com/A_3570.htm…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now