Link to home
Start Free TrialLog in
Avatar of bitt3n
bitt3n

asked on

how can I make this COUNT() query more efficient?

the following query executes fast (0.008 secs)

SELECT shops.shop, products.shop_id, COUNT( products.product_id ) AS total_products
FROM products, users, shops
WHERE user_id =1
AND (
products.date >= users.prior_login
)
AND shops.shop_id = products.shop_id
GROUP BY products.shop_id

Open in new window


here is the EXPLAIN plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	users	const	PRIMARY,prior_login,user_id	PRIMARY	4	const	1	Using temporary; Using filesort
1	SIMPLE	shops	ALL	PRIMARY	NULL	NULL	NULL	90	 
1	SIMPLE	products	ref	shop_id,date,shop_id_2,shop_id_3	shop_id	4	bitt3n_minxa.shops.shop_id	5338	Using where

Open in new window


however, execution time increases to over 5 secs when I add an extra condition that retrieves the relevant sex for the product from the sex table (also 475K rows)

SELECT shops.shop, products.shop_id AS shop_id, COUNT( products.product_id ) AS total_products, sex.sex AS sex
FROM products, users, shops
INNER JOIN sex ON
	sex.product_id = products.product_id AND
	sex.sex=0
WHERE user_id =1
AND (
products.date >= users.prior_login
)
AND shops.shop_id = products.shop_id
GROUP BY products.shop_id ORDER BY shop ASC

Open in new window


here is the EXPLAIN plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	users	const	PRIMARY,prior_login,user_id	PRIMARY	4	const	1	Using temporary; Using filesort
1	SIMPLE	sex	index	product_id_2,product_id	product_id_2	5	NULL	363101	Using where; Using index
1	SIMPLE	products	ref	PRIMARY,shop_id,date,shop_id_2,product_id,shop_id_...	product_id	4	bitt3n_minxa.sex.product_id	1	Using where
1	SIMPLE	shops	eq_ref	PRIMARY	PRIMARY	4	bitt3n_minxa.products.shop_id	1	 

Open in new window


I do have an index on product_id in the sex table. How can I make this query faster?
Avatar of bitt3n
bitt3n

ASKER

I noticed that getting the sex of every product in the products table via

SELECT * 
FROM products, sex
WHERE sex.product_id = products.product_id

Open in new window


takes 0.03 secs. therefore it seems that getting the sex for the small subset of the products in the products table satisfying

WHERE user_id =1
AND (
products.date >= users.prior_login
)

Open in new window


should not be costly operation.
Avatar of bitt3n

ASKER

I found a query that gets the results I want in 0.014 secs

SELECT sex, shops.shop, p1.shop_id, COUNT( p1.product_id ) AS total_products
FROM sex, shops INNER JOIN
(SELECT product_id, shop_id FROM
users, products WHERE
user_id =1
AND (
products.date >= users.prior_login
)) AS p1 ON p1.product_id=sex.product_id
WHERE 
shops.shop_id = p1.shop_id
GROUP BY p1.shop_id

Open in new window


here is the EXPLAIN plan

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	<derived2>	ALL	NULL	NULL	NULL	NULL	944	Using temporary; Using filesort
1	PRIMARY	shops	eq_ref	PRIMARY	PRIMARY	4	p1.shop_id	1	 
1	PRIMARY	sex	ref	product_id_2,product_id	product_id	4	p1.product_id	1	 
2	DERIVED	users	const	PRIMARY,prior_login,user_id	PRIMARY	4	 	1	 
2	DERIVED	products	range	date	date	8	NULL	2144	Using where

Open in new window


I am wondering why this runs so much faster, and whether there is a simpler format that will be as fast but without the subquery.
Avatar of arnold
What data type is products.date is it a datetime/timestamp? integer (unix timestamp) seconds elapsed since epoch (1/1/1970 GMT)?
What colums exist in each table?  Could you post the create table for each one involved?
Avatar of bitt3n

ASKER

sure here are the SHOW CREATE TABLE for each relevant table. products.date is DATETIME

CREATE TABLE `products` (\n  `product_id` int(10) NOT NULL auto_increment,\n  `shop_id` int(10) NOT NULL default '0',\n  `title` varchar(120) NOT NULL default '',\n  `brand` varchar(80) NOT NULL default '',\n  `price` float NOT NULL default '0',\n  `image` varchar(255) NOT NULL default '',\n  `image_height` smallint(6) default NULL,\n  `image_width` smallint(6) default NULL,\n  `link` varchar(255) NOT NULL default '',\n  `date` datetime NOT NULL default '0000-00-00 00:00:00',\n  PRIMARY KEY  (`product_id`),\n  UNIQUE KEY `shop_id` (`shop_id`,`link`),\n  UNIQUE KEY `title_2` (`title`,`image`),\n  KEY `brand` (`brand`),\n  KEY `title` (`title`),\n  KEY `date` (`date`),\n  KEY `shop_id_2` (`shop_id`),\n  KEY `product_id` (`product_id`,`date`),\n  KEY `shop_id_3` (`shop_id`,`date`)\n) ENGINE=MyISAM AUTO_INCREMENT=553314 DEFAULT CHARSET=latin1

Open in new window


CREATE TABLE `shops` (\n  `shop_id` int(10) NOT NULL auto_increment,\n  `shop` varchar(80) NOT NULL default '',\n  `last_look` datetime NOT NULL default '0000-00-00 00:00:00',\n  PRIMARY KEY  (`shop_id`)\n) ENGINE=MyISAM AUTO_INCREMENT=142 DEFAULT CHARSET=latin1

Open in new window


CREATE TABLE `users` (\n  `user_id` int(12) NOT NULL auto_increment,\n  `username` varchar(16) NOT NULL default '',\n  `password` varchar(40) NOT NULL default '',\n  `registration` datetime NOT NULL default '0000-00-00 00:00:00',\n  `last_login` datetime NOT NULL default '0000-00-00 00:00:00',\n  `prior_login` datetime NOT NULL default '0000-00-00 00:00:00',\n  `active_flair_id` int(11) NOT NULL default '1',\n  PRIMARY KEY  (`user_id`),\n  UNIQUE KEY `username` (`username`),\n  KEY `prior_login` (`prior_login`),\n  KEY `user_id` (`user_id`,`prior_login`)\n) ENGINE=MyISAM AUTO_INCREMENT=35 DEFAULT CHARSET=latin1

Open in new window


CREATE TABLE `sex` (\n  `product_id` int(10) NOT NULL default '0',\n  `sex` tinyint(1) NOT NULL default '0' COMMENT '0=f 1=m 2=unisex',\n  `date` datetime NOT NULL default '0000-00-00 00:00:00',\n  `hit_counter` int(11) NOT NULL default '0',\n  UNIQUE KEY `product_id_2` (`product_id`,`sex`),\n  KEY `product_id` (`product_id`)\n) ENGINE=MyISAM DEFAULT CHARSET=latin1

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of skullnobrains
skullnobrains

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
oups i left a dumb AND at the end of the line before the WHERE
i assume you can coorect by yourself, but since i noticed...
Avatar of bitt3n

ASKER

your query runs in 0.086 secs (consistently ~0.01 sec slower than my old query running on the database as it stands now, not a material difference but interesting) the explain plan is

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	products	range	PRIMARY,shop_id,date,shop_id_2,product_id,shop_id_...	date	8	NULL	8306	Using where; Using temporary; Using filesort
1	PRIMARY	shops	eq_ref	PRIMARY	PRIMARY	4	bitt3n_minxa.products.shop_id	1	 
1	PRIMARY	sex	ref	product_id_2,product_id	product_id	4	bitt3n_minxa.products.product_id	1	Using where
2	SUBQUERY	users	const	PRIMARY,user_id	PRIMARY	4	 	1	 

Open in new window


My understanding is that you forced the order of the JOINS by putting the condition to be executed first in the WHERE clause. Is this correct?

I'm curious whether there is a reason to put 'sex.sex=0' in the WHERE clause instead of the sex JOIN (which executes just as fast).

That is a good question regarding why I even have a sex table at all, and I'm not sure I made the correct choice. My database contains products that are either male-specific, female-specific or unisex. My options are to have two extra columns in the products table for male and female (with unisex being both of these TRUE), one column with values 0-2 for the male, female, and unisex, or to have a separate sex table.

The sex table has columns for product_id and sex, as well as columns recording the hit count for each sex (ie, number of times someone viewing men's products clicked on the item, and number of times someone viewing women's products clicked on the item. I cannot use just one hit counter since I need both male and female counts for unisex products.) I also record the DATETIME a given product was added to the both the men's and women's product sections respectively (which may not be the same value).

Since almost no products are unisex, I figured a separate table was better than 3-4 extra columns in the products table with so many irrelevant entries (for example, male date and hit counter for female product). Removing the sex table would certainly simplify my queries. Would it be smart to get rid of it?

Thinking about this again now it seems like I could just use two columns for the hit counters, and if one of the sexes is irrelevant, use NULL for that counter, which means I could get rid of the sex table by adding five columns to the products table (these being sex, male_hit_counter, female_hit_counter, date_added_to_male_section, date_added_to_female_section. The sex table itself has 4 columns (product_id,sex,hit_counter,date), with no irrelevant entries (since if a product isn't relevant to a given sex, I don't add the row for that sex).
Avatar of bitt3n

ASKER

I made the topic of the sex table a separate question since it is not really related to the original question here:

https://www.experts-exchange.com/questions/27523507/did-I-screw-up-this-database-structure.html
<quote>
your query runs in 0.086 secs (consistently ~0.01 sec slower than my old query running on the database as it stands now, not a material difference but interesting) the explain plan is
</quote>

i believe you still would be better off with the one i gave

the reason is mainly scalability because, it is cleaner (and easier to understand next year)

on one side, you examin more rows in the products table (but that could also be less with different indexation or changes related to the sex table), but then you only read indexes with ref or eq_ref which is as fast as it can be given the fact that the ref can never yield many rows, and you even mostly use primary keys which represents a valuable performance gain when using innodb

on the other mysql has to maintain 3 different temporary resultsets in memory and cross-reference them. whenever the dataset becomes greater and/or the server is loaded, it is very likely to become MUCH slower

<quote>
My understanding is that you forced the order of the JOINS by putting the condition to be executed first in the WHERE clause. Is this correct?
</quote>

actually, not really but that's a matter of appreciation : there is no such thing as a join in this case as i understand it. mysql executes the SUBQUERY first (the explain plan is misleading in this case), and then uses the result in a regular where, and in a totally different query.

the reason why i make this diffference is that there are no common columns between these tables, and a join clause is not or rather should never use anything but equality or the contrary as a comparison. a join must be seen as the product of 2 tables (ie containing as many rows as the product of the 2 and as many columns as the sum), but obviously when you have proper indexes mysql will not create such a table in memory.

if you were performing the same query on many users at once, the performance would probably be lamish, but then you would clearly have a problem with the database schema

<quote>
I'm curious whether there is a reason to put 'sex.sex=0' in the WHERE clause instead of the sex JOIN (which executes just as fast).
</quote>

makes no difference at all. actually mysql takes all the ON clauses and ANDs them with the where clause first thing before analysing the query.

i consider it good practice though to stick all and only the clauses related to the join in the ON clauses.

it is just a good way to make the query clearer, and detect things that are not coherent either with the database scheme or with the query itself, but it is not required in any way.

<quote>
My database contains products that are either male-specific, female-specific or unisex.
</quote>

i suppose you would be better of with 3 extra columns then
type : Male | Female | Unisex
and the counters

<quote>
Would it be smart to get rid of it
</quote>

i believe it is smart to use two tables when you can say something like
"for every element in table X, i have 0/1 to N elements in table Y"

if the relation is 1 to 1, you do not need an extra table
if the relation is 1 to something that is finite, it mostly depends on how you think the database will evolve
if neither is true, you probably need a 3rd table in between

<quote>
if one of the sexes is irrelevant, use NULL for that counter
</quote>

if you do so remember that 0 and NULL are different and maintaining it will let you perform averages easily because the null entries will be ignored altogether (ie the average of 6 and NULL is 6 but the average of 6 and 0 is 3)

----

i have no more time, so happy coding to you !

if you have more stuff you want to ask about, feel free in this thread even if unrelated

Avatar of bitt3n

ASKER

thanks!!