how can I make this COUNT() query more efficient?

the following query executes fast (0.008 secs)

SELECT shops.shop, products.shop_id, COUNT( products.product_id ) AS total_products
FROM products, users, shops
WHERE user_id =1
AND (
products.date >= users.prior_login
)
AND shops.shop_id = products.shop_id
GROUP BY products.shop_id

Open in new window


here is the EXPLAIN plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	users	const	PRIMARY,prior_login,user_id	PRIMARY	4	const	1	Using temporary; Using filesort
1	SIMPLE	shops	ALL	PRIMARY	NULL	NULL	NULL	90	 
1	SIMPLE	products	ref	shop_id,date,shop_id_2,shop_id_3	shop_id	4	bitt3n_minxa.shops.shop_id	5338	Using where

Open in new window


however, execution time increases to over 5 secs when I add an extra condition that retrieves the relevant sex for the product from the sex table (also 475K rows)

SELECT shops.shop, products.shop_id AS shop_id, COUNT( products.product_id ) AS total_products, sex.sex AS sex
FROM products, users, shops
INNER JOIN sex ON
	sex.product_id = products.product_id AND
	sex.sex=0
WHERE user_id =1
AND (
products.date >= users.prior_login
)
AND shops.shop_id = products.shop_id
GROUP BY products.shop_id ORDER BY shop ASC

Open in new window


here is the EXPLAIN plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	users	const	PRIMARY,prior_login,user_id	PRIMARY	4	const	1	Using temporary; Using filesort
1	SIMPLE	sex	index	product_id_2,product_id	product_id_2	5	NULL	363101	Using where; Using index
1	SIMPLE	products	ref	PRIMARY,shop_id,date,shop_id_2,product_id,shop_id_...	product_id	4	bitt3n_minxa.sex.product_id	1	Using where
1	SIMPLE	shops	eq_ref	PRIMARY	PRIMARY	4	bitt3n_minxa.products.shop_id	1	 

Open in new window


I do have an index on product_id in the sex table. How can I make this query faster?
bitt3nAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

bitt3nAuthor Commented:
I noticed that getting the sex of every product in the products table via

SELECT * 
FROM products, sex
WHERE sex.product_id = products.product_id

Open in new window


takes 0.03 secs. therefore it seems that getting the sex for the small subset of the products in the products table satisfying

WHERE user_id =1
AND (
products.date >= users.prior_login
)

Open in new window


should not be costly operation.
0
bitt3nAuthor Commented:
I found a query that gets the results I want in 0.014 secs

SELECT sex, shops.shop, p1.shop_id, COUNT( p1.product_id ) AS total_products
FROM sex, shops INNER JOIN
(SELECT product_id, shop_id FROM
users, products WHERE
user_id =1
AND (
products.date >= users.prior_login
)) AS p1 ON p1.product_id=sex.product_id
WHERE 
shops.shop_id = p1.shop_id
GROUP BY p1.shop_id

Open in new window


here is the EXPLAIN plan

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	944	Using temporary; Using filesort
1	PRIMARY	shops	eq_ref	PRIMARY	PRIMARY	4	p1.shop_id	1	 
1	PRIMARY	sex	ref	product_id_2,product_id	product_id	4	p1.product_id	1	 
2	DERIVED	users	const	PRIMARY,prior_login,user_id	PRIMARY	4	 	1	 
2	DERIVED	products	range	date	date	8	NULL	2144	Using where

Open in new window


I am wondering why this runs so much faster, and whether there is a simpler format that will be as fast but without the subquery.
0
arnoldCommented:
What data type is products.date is it a datetime/timestamp? integer (unix timestamp) seconds elapsed since epoch (1/1/1970 GMT)?
What colums exist in each table?  Could you post the create table for each one involved?
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

bitt3nAuthor Commented:
sure here are the SHOW CREATE TABLE for each relevant table. products.date is DATETIME

CREATE TABLE `products` (\n  `product_id` int(10) NOT NULL auto_increment,\n  `shop_id` int(10) NOT NULL default '0',\n  `title` varchar(120) NOT NULL default '',\n  `brand` varchar(80) NOT NULL default '',\n  `price` float NOT NULL default '0',\n  `image` varchar(255) NOT NULL default '',\n  `image_height` smallint(6) default NULL,\n  `image_width` smallint(6) default NULL,\n  `link` varchar(255) NOT NULL default '',\n  `date` datetime NOT NULL default '0000-00-00 00:00:00',\n  PRIMARY KEY  (`product_id`),\n  UNIQUE KEY `shop_id` (`shop_id`,`link`),\n  UNIQUE KEY `title_2` (`title`,`image`),\n  KEY `brand` (`brand`),\n  KEY `title` (`title`),\n  KEY `date` (`date`),\n  KEY `shop_id_2` (`shop_id`),\n  KEY `product_id` (`product_id`,`date`),\n  KEY `shop_id_3` (`shop_id`,`date`)\n) ENGINE=MyISAM AUTO_INCREMENT=553314 DEFAULT CHARSET=latin1

Open in new window


CREATE TABLE `shops` (\n  `shop_id` int(10) NOT NULL auto_increment,\n  `shop` varchar(80) NOT NULL default '',\n  `last_look` datetime NOT NULL default '0000-00-00 00:00:00',\n  PRIMARY KEY  (`shop_id`)\n) ENGINE=MyISAM AUTO_INCREMENT=142 DEFAULT CHARSET=latin1

Open in new window


CREATE TABLE `users` (\n  `user_id` int(12) NOT NULL auto_increment,\n  `username` varchar(16) NOT NULL default '',\n  `password` varchar(40) NOT NULL default '',\n  `registration` datetime NOT NULL default '0000-00-00 00:00:00',\n  `last_login` datetime NOT NULL default '0000-00-00 00:00:00',\n  `prior_login` datetime NOT NULL default '0000-00-00 00:00:00',\n  `active_flair_id` int(11) NOT NULL default '1',\n  PRIMARY KEY  (`user_id`),\n  UNIQUE KEY `username` (`username`),\n  KEY `prior_login` (`prior_login`),\n  KEY `user_id` (`user_id`,`prior_login`)\n) ENGINE=MyISAM AUTO_INCREMENT=35 DEFAULT CHARSET=latin1

Open in new window


CREATE TABLE `sex` (\n  `product_id` int(10) NOT NULL default '0',\n  `sex` tinyint(1) NOT NULL default '0' COMMENT '0=f 1=m 2=unisex',\n  `date` datetime NOT NULL default '0000-00-00 00:00:00',\n  `hit_counter` int(11) NOT NULL default '0',\n  UNIQUE KEY `product_id_2` (`product_id`,`sex`),\n  KEY `product_id` (`product_id`)\n) ENGINE=MyISAM DEFAULT CHARSET=latin1

Open in new window

0
skullnobrainsCommented:
SELECT shops.shop, products.shop_id AS shop_id, COUNT( products.product_id ) AS total_products, sex.sex AS sex
FROM products
INNER JOIN shops
ON shops.shop_id = products.shop_id
INNER JOIN sex
ON sex.product_id = products.product_id AND
WHERE
products.date >= ( select users.prior_login from users where user_id=1 )
AND       sex.sex=0
GROUP BY products.shop_id ORDER BY shop ASC

i guess this will be enough to make mysql do the proper optimisations since it will not be mislead into trying to join the users table with something else

you may consider to index sex,product_id or sex alone from table sex. this wll probably not be worthwile as mysql will rather make a range in the products table and a join and then filter by sex, rather than the reverse. if your date column from the sex table holds the same data as the date in the products table, you may ba able to do much better by changing columns and indexing but the gain will be little.

ps : i don't really understand what is the use of the sex table compared to having 3 counters (one per sex since you only have 3 of them) in the products table, but then my understanding of your database schema is purely conjectural.

pps : as a general rule i'd like to give you 2 advices :
- write your quelries with ON clauses that always and only correspond to the join clause : this prevents lots of human mixups even though mysql does not care wether you put stuff in the ON or where clause. it also prevents from trying to join tables which are not supposed to be joined and accidentally fool mysql in the process. when you often want to do this and you do not have a simple workaround with a simple subquery, you need to rethink the database schema.
- never write a subquery that possibly produce a lot of results (i'd say the good number is 1, and an acceptable number is anything you can count) unless you really know what you are doing : this will prevent queries that work now from exploding in the future, or possibly after a migration

happy coding
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
skullnobrainsCommented:
oups i left a dumb AND at the end of the line before the WHERE
i assume you can coorect by yourself, but since i noticed...
0
bitt3nAuthor Commented:
your query runs in 0.086 secs (consistently ~0.01 sec slower than my old query running on the database as it stands now, not a material difference but interesting) the explain plan is

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	products	range	PRIMARY,shop_id,date,shop_id_2,product_id,shop_id_...	date	8	NULL	8306	Using where; Using temporary; Using filesort
1	PRIMARY	shops	eq_ref	PRIMARY	PRIMARY	4	bitt3n_minxa.products.shop_id	1	 
1	PRIMARY	sex	ref	product_id_2,product_id	product_id	4	bitt3n_minxa.products.product_id	1	Using where
2	SUBQUERY	users	const	PRIMARY,user_id	PRIMARY	4	 	1	 

Open in new window


My understanding is that you forced the order of the JOINS by putting the condition to be executed first in the WHERE clause. Is this correct?

I'm curious whether there is a reason to put 'sex.sex=0' in the WHERE clause instead of the sex JOIN (which executes just as fast).

That is a good question regarding why I even have a sex table at all, and I'm not sure I made the correct choice. My database contains products that are either male-specific, female-specific or unisex. My options are to have two extra columns in the products table for male and female (with unisex being both of these TRUE), one column with values 0-2 for the male, female, and unisex, or to have a separate sex table.

The sex table has columns for product_id and sex, as well as columns recording the hit count for each sex (ie, number of times someone viewing men's products clicked on the item, and number of times someone viewing women's products clicked on the item. I cannot use just one hit counter since I need both male and female counts for unisex products.) I also record the DATETIME a given product was added to the both the men's and women's product sections respectively (which may not be the same value).

Since almost no products are unisex, I figured a separate table was better than 3-4 extra columns in the products table with so many irrelevant entries (for example, male date and hit counter for female product). Removing the sex table would certainly simplify my queries. Would it be smart to get rid of it?

Thinking about this again now it seems like I could just use two columns for the hit counters, and if one of the sexes is irrelevant, use NULL for that counter, which means I could get rid of the sex table by adding five columns to the products table (these being sex, male_hit_counter, female_hit_counter, date_added_to_male_section, date_added_to_female_section. The sex table itself has 4 columns (product_id,sex,hit_counter,date), with no irrelevant entries (since if a product isn't relevant to a given sex, I don't add the row for that sex).
0
bitt3nAuthor Commented:
I made the topic of the sex table a separate question since it is not really related to the original question here:

http://www.experts-exchange.com/Database/MySQL/Q_27523507.html
0
skullnobrainsCommented:
<quote>
your query runs in 0.086 secs (consistently ~0.01 sec slower than my old query running on the database as it stands now, not a material difference but interesting) the explain plan is
</quote>

i believe you still would be better off with the one i gave

the reason is mainly scalability because, it is cleaner (and easier to understand next year)

on one side, you examin more rows in the products table (but that could also be less with different indexation or changes related to the sex table), but then you only read indexes with ref or eq_ref which is as fast as it can be given the fact that the ref can never yield many rows, and you even mostly use primary keys which represents a valuable performance gain when using innodb

on the other mysql has to maintain 3 different temporary resultsets in memory and cross-reference them. whenever the dataset becomes greater and/or the server is loaded, it is very likely to become MUCH slower

<quote>
My understanding is that you forced the order of the JOINS by putting the condition to be executed first in the WHERE clause. Is this correct?
</quote>

actually, not really but that's a matter of appreciation : there is no such thing as a join in this case as i understand it. mysql executes the SUBQUERY first (the explain plan is misleading in this case), and then uses the result in a regular where, and in a totally different query.

the reason why i make this diffference is that there are no common columns between these tables, and a join clause is not or rather should never use anything but equality or the contrary as a comparison. a join must be seen as the product of 2 tables (ie containing as many rows as the product of the 2 and as many columns as the sum), but obviously when you have proper indexes mysql will not create such a table in memory.

if you were performing the same query on many users at once, the performance would probably be lamish, but then you would clearly have a problem with the database schema

<quote>
I'm curious whether there is a reason to put 'sex.sex=0' in the WHERE clause instead of the sex JOIN (which executes just as fast).
</quote>

makes no difference at all. actually mysql takes all the ON clauses and ANDs them with the where clause first thing before analysing the query.

i consider it good practice though to stick all and only the clauses related to the join in the ON clauses.

it is just a good way to make the query clearer, and detect things that are not coherent either with the database scheme or with the query itself, but it is not required in any way.

<quote>
My database contains products that are either male-specific, female-specific or unisex.
</quote>

i suppose you would be better of with 3 extra columns then
type : Male | Female | Unisex
and the counters

<quote>
Would it be smart to get rid of it
</quote>

i believe it is smart to use two tables when you can say something like
"for every element in table X, i have 0/1 to N elements in table Y"

if the relation is 1 to 1, you do not need an extra table
if the relation is 1 to something that is finite, it mostly depends on how you think the database will evolve
if neither is true, you probably need a 3rd table in between

<quote>
if one of the sexes is irrelevant, use NULL for that counter
</quote>

if you do so remember that 0 and NULL are different and maintaining it will let you perform averages easily because the null entries will be ignored altogether (ie the average of 6 and NULL is 6 but the average of 6 and 0 is 3)

----

i have no more time, so happy coding to you !

if you have more stuff you want to ask about, feel free in this thread even if unrelated

0
bitt3nAuthor Commented:
thanks!!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.