Link to home
Start Free TrialLog in
Avatar of averasolutions
averasolutions

asked on

MySQL "Order By" clause is making the query very slow...

Hi Experts,

I have a very simple query and it executes in 0.0043 seconds WITHOUT the "ORDER BY" clause.

When I introduce the ORDER BY clause (as seen below) the query suddenly becomes over 5 seconds.

I can't index the order by column because it is a dynamic column in a temporary, temporary table that is created on-the-fly by MySQL at the time of execution.

So why is it so slow? And what can I do to speed it up.

With ORDER BY clause: 5.0443 seconds
Without ORDER BY clause: 0.0043 seconds




SELECT
	i.iditems,
	MIN(d.price) AS price,
	MAX(d.price)AS pricefrom,
	COUNT(d.iddeals) AS deals,
	i.name,
	i.summary
FROM
	items AS i
INNER JOIN
	deals AS d 
	ON d.iditems = i.iditems 
GROUP BY
	d.iditems
ORDER BY 
	deals DESC
LIMIT
	10;

Open in new window

Avatar of Bernard Savonet
Bernard Savonet
Flag of France image

Not sure, but I would think that the problem does not really come from the order, but of its combination with LIMIT:
- with limit without order, you proably get the first 10 records that happen to be here
- with limit and order, you first sort the N records, then take the 10
This implies that it handles N records and sorts them in xlog(N) time, while in the other case it handles 10 records and does not sort them

Check http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html
>I can't index the order by column because it is a dynamic column in a temporary, temporary table that is created on-the-fly by MySQL at the time of execution.

from the query I see, there seems to be nothing dynamic in the tables, I presume it's the query is dynamic?
in that case, you could still have (and keep) the index on that column.

you will need to check out the explain plan, and also check out the mysql system parameters about memory usage for key data...
Avatar of averasolutions
averasolutions

ASKER

@angelIII,

No - what I mean is that the "count" is not a column in a specific table that I can just index, it's created with this line:

[code]
COUNT(d.iddeals) AS deals
[/code]
AveraSolution,

Have you had the opportunity to check my information? Just to rephrase it: as soon as you use "ORDER" a sort is needed, while it was not before.

So with a limit of 10 and 1000 elements:
LIMIT 0,10  this delivers you the first "physical" 10 elements in the table, whichever values their attributes are

ORDER BY .. LIMIT 0,10  needs first to sort the 1000 elements (this is at best some multiple of n.log(n), but might go to n^3), then delivers the 10 results: the time will be close to getting all the results, because the sort will be the more time-consuming task
Note: the algorithms generally used to sort are very clever... but most of the time they have no easy way to find the "first 10" results. Maybe however in that case they are using some (lots slower in other circumstances) algorithm which is more effective in delivering the first (or last) 10 records.
@fibo

I was actually looking at the link you provided and was going to come back to you later.

So - in short are you suggesting that, possibly, it may be a better idea to pull out all million matching rows and use PHP to eliminate the remaining rows that are un-needed?

That would be too memory intensive as the items table has over 2 million rows and the deals table has over 4.5 million rows.

Below is the full query that is slow, the one above was the cut down version that i was stripping down piece by piece to find where the slow-ness was occuring:

I have also included the explain result:
SELECT
	i.iditems AS iditems,
	i.name AS name,
	COUNT(d.iddeals) AS deals,
	MIN(d.price) AS pricefrom,
	IFNULL(r.retailername, '') AS retailername,
	IFNULL(m.manuname, '') AS manuname
FROM
	items i
INNER JOIN
	(deals d CROSS JOIN retailers r)
	ON ( (d.iditems = i.iditems AND d.idcurrencies = 1 AND d.idlanguages = 1) AND (d.idretailers = r.idretailers AND r.active = 1) )
INNER JOIN
	(categories c CROSS JOIN categories nodes)
	ON ( (nodes.lft BETWEEN c.lft AND c.rgt) AND nodes.idcategories = i.idcategories )
LEFT JOIN
	manufacturers m
	ON m.idmanufacturers = i.idmanufacturers
WHERE
	c.idcategories = '1043'
GROUP BY
	i.iditems
ORDER BY
	deals DESC, pricefrom ASC
LIMIT
	100;
 
----------------------
 
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	c	const	PRIMARY,lft	PRIMARY	4	const	1	Using temporary; Using filesort
1	SIMPLE	d	ref	retailer_sku,iditems,idcurrencies,idlanguages,idit-idcur-idlang,idit-idcur-idlang-idret	idlanguages	4	const	49754	Using where
1	SIMPLE	r	eq_ref	PRIMARY,active,idretailers-active	PRIMARY	4	price_db.d.idretailers	1	Using where
1	SIMPLE	i	eq_ref	PRIMARY,idcategories,select,idit-idm,idit-idc-idm	PRIMARY	4	price_db.d.iditems	1	
1	SIMPLE	nodes	eq_ref	PRIMARY,lft	PRIMARY	4	price_db.i.idcategories	1	Using where
1	SIMPLE	m	eq_ref	PRIMARY	PRIMARY	4	price_db.i.idmanufacturers	1	

Open in new window

no, there is no way you could efficiently handle the millions records in php.

so iit has to be done with sql...
Mhhh.. how many records with c.idcategories = '1043' ? is there an index on idcategories?
 
Hi Again,

Within that category (and it's children cats) we have 148530 items with goodness knows how many deals attached to them.

I have made sure that every time a table is joined, an index is created on the columns and if a table is joined to many tables that an index is created on all the join columns acumulatively.

In theory - every column used in the ebove query has an index.
ASKER CERTIFIED SOLUTION
Avatar of Bernard Savonet
Bernard Savonet
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Yeah - I was starting to think that I had pretty much hit the limits od MySQL.

The site is actually an online catalogue believe it or not. I am probably going to do as you have suggested an make a "de-normalised" table to query these results.

The problem is, that method is sloppy and I wanted to stay well away from that option.
Well, you have in fact sort of
- left the land of operational systems relying heavily on SQL and normalized tables, to extract a few records out of millions
- and are nearing the land of decision support systems where hypercubes and denormalization are needed to get a correct performance for managing (hundreds of) thousands of records out of the same millions

However, if you "are" a catalog, there are probably other ways to get your answer.

I would suggest you re-start from scratch, forget SQL, and assume each catalog object is one row/ one record. Just write sentences in plain English, and work with "normal words". UML people would (more or less) call that "use cases".
Try, now only, to build your tables.
There are probably some logical shortcuts that would help you?

I am working on a similar problem for a catalog: I wantr to display the minimal price in each category. First shot is to compute on the fly and live this value... fine for the prototype with a few dozens product and no visitors... but it will not upscale well. So I am devising a cron job which will recompute these minima once a day... and now my values will be in a single table.

Sure, if there is a massive price update and I do nothing, most values will be incorrect... but then I just have to launch the "batch" job.
And the rest off the time, only a few categories, if any, would have a wrong minimum.
Hi,

Thanks for your assistance in this matter.

The catalogue works fine for our call centre as they don't mind waiting 2-3 seconds for a result. However, to be honest - we are looking at making it a website and those kind of time scales are just unrealistic.

I have never had any dealings with Oracle. - If I looked at Oracle, would this be a better option for this project.

How difficult is Oracle to learn?
Is it based on standard SQL?

I'm not sure you may be able to answer these questions, but I think that now we have established that this project is beyond the limits of SQL, maybe a different database solution is a better bet.

Thanks.
Be VERY careful with Oracle licence  terms on a web-connected machine, specifically check how they upscale with traffic (and a xritten confirmation)

Consider exploring the track of a more powerful server with lots of RAM.

Make also some tests on Postgres.

Re-consider the cache idea
Averasolutions,
Your problem has received answers and suggestions for improvement since your problem is out of the limits of "normal" use. Discovering that your problem has none of the solutions you were hoping for IS AN ANSWER to your problem, even if it is unpleasant.

I believe you cannot cancel / delete the question:
- it is important that people with a similar problem find this discussion and so will be able no to start again from scratch. Maybe for them some of the tracks suggested would be helpful.
- you have received answer and suggestions from AngelIII, which is certainly an authority here as far as MySQL is concerned.
- I think I have also contributed to explain the source of the problem you were experiencing, as well as suggesting you some directions you might explore.

Please reread the whole thread.
Kind regards,
B.
I did not in any way suggest my question was not answered. I am simply saying that this thread has not been very useful to us.

We have found the solution to the problem and it was in my.cnf memory allocation and disk access time which have all been improved and the query now runs at aveage 0.0226 seconds.

I understand that your guys tried your best, however, I was told that what I was trying to achieve was outside the scope of MySQL when it is not. Therefore I have asked for this to be removed as I believe it may be a little misleading to anyone who may read it.
Sorry, but the solution you found is actually one that was suggested and to which you made no comment:

"- check with your hosting provider which is the maximum ram-size you are allowed to use for sql server. can it be increased? at which price? are you allowed to make some test so that you can balance cost and performance"

was pointing you in the right direction, and you were later reminded

"Consider exploring the track of a more powerful server with lots of RAM."

And before your most recent answer, you did not mention that you had found a solution and which it was.

This solution merits to stay in the knowledge base here
To be completely honest, I thought I was doing the right thing asking for this to be removed.

I didn't realise I would upset you or offend you by doing so.
B-))
Thats why moderators are so useful, bringing us calm and peace.
Thxs averasolutions, thx vee_mod!