dolythgoe
asked on
SQL query on a simple table to return realted items
Hi all,
I'm trying to do a query on the following table:
It's a many 2 many table - about 6 million rows - fid range is about 650.
Now, the query I'm hoping to do is to lookup an fid (say 20) and return all fid's of which had link_id's associated with the fid 20.
Think of it as 'Ok, now you've picked filter option 20, here's the rest of the filter options available to you from the remaining pool'
Experimental, if there's a smarter way to do this, I'm all ears :)
I'm trying to do a query on the following table:
CREATE TABLE `filter_index` (
`fid` SMALLINT(5) UNSIGNED NOT NULL,
`link_id` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`fid`,`link_id`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1
It's a many 2 many table - about 6 million rows - fid range is about 650.
Now, the query I'm hoping to do is to lookup an fid (say 20) and return all fid's of which had link_id's associated with the fid 20.
Think of it as 'Ok, now you've picked filter option 20, here's the rest of the filter options available to you from the remaining pool'
Experimental, if there's a smarter way to do this, I'm all ears :)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ok, so that query was really fast but now I have the annoyance of the GROUP BY or DISTINCT command to make sure it only gives me the unique fid's. Don't need the link_id's
I tried a few things:
8 seconds (without GROUP BY = 0.05)
Which the query profiler in SQL yog says most of the time is spent 'Sending data' <- I'm still trying to understand what that actually means!
EXPLAIN
Now, adding an index to link_id...
Re-run the query with GROUP BY
:D
0.045 seconds! MUCH BETTER!
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fi2 ref PRIMARY,NewIndex1 PRIMARY 2 const 13322 Using index; Using temporary; Using filesort
1 SIMPLE fi1 ref NewIndex1 NewIndex1 4 rampants_dbp.fi2.link_id 3
Much better - uses filesort and temporary but that's cool as it's in memory.
From here I want to entertain the small matter of 'AND' with different fids.
So this is for an AJAX filter 'grey out' function - so I can ask mysql to give me all appliceable filter options that are still available if a number of filters are picked. That doesn't stop there though, the fid's are in groups so (1,2,3,4,5,6) are colours for example and (7,8,9,10,11,12) are sizes <- like ecommerce product selector.
I'll give you the points but if you could share some brainpower on this I would be most grateful!
SO there are 20 groups, 650 filter_ids and 1-2 million link_id's
Each group has a number of filter_id's which have a number of link_ids.
If filter_id's are picked within a group GROUP 1 (2, 4, 5) selected and GROUP 2 (8,9,10) selected, I want to be able to query so it looks at like:
SHOW ME ALL AVALIABLE fids FROM links subset WHEN fid IS (1 OR 2 OR 3) AND (3 OR 4 OR 5)
An example that uses this type of thing is asos.com - have a play with their product filter and you'll see what I mean - it greys out other options as and when you pick filter options. Great functionality but at how much computing cost I wonder..
I can't thank you enough for helping me out!
I tried a few things:
SELECT SQL_NO_CACHE fi1.fid
FROM `filter_index` fi1 JOIN `filter_index` fi2 ON (fi1.link_id=fi2.link_id)
WHERE fi2.fid=20
GROUP BY fi1.fid
8 seconds (without GROUP BY = 0.05)
Which the query profiler in SQL yog says most of the time is spent 'Sending data' <- I'm still trying to understand what that actually means!
EXPLAIN
Now, adding an index to link_id...
Re-run the query with GROUP BY
:D
0.045 seconds! MUCH BETTER!
EXPLAIN SELECT SQL_NO_CACHE fi1.fid
FROM `filter_index` fi1 JOIN `filter_index` fi2 ON (fi1.link_id=fi2.link_id)
WHERE fi2.fid=20
GROUP BY fi1.fid
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fi2 ref PRIMARY,NewIndex1 PRIMARY 2 const 13322 Using index; Using temporary; Using filesort
1 SIMPLE fi1 ref NewIndex1 NewIndex1 4 rampants_dbp.fi2.link_id 3
Much better - uses filesort and temporary but that's cool as it's in memory.
From here I want to entertain the small matter of 'AND' with different fids.
So this is for an AJAX filter 'grey out' function - so I can ask mysql to give me all appliceable filter options that are still available if a number of filters are picked. That doesn't stop there though, the fid's are in groups so (1,2,3,4,5,6) are colours for example and (7,8,9,10,11,12) are sizes <- like ecommerce product selector.
I'll give you the points but if you could share some brainpower on this I would be most grateful!
SO there are 20 groups, 650 filter_ids and 1-2 million link_id's
Each group has a number of filter_id's which have a number of link_ids.
If filter_id's are picked within a group GROUP 1 (2, 4, 5) selected and GROUP 2 (8,9,10) selected, I want to be able to query so it looks at like:
SHOW ME ALL AVALIABLE fids FROM links subset WHEN fid IS (1 OR 2 OR 3) AND (3 OR 4 OR 5)
An example that uses this type of thing is asos.com - have a play with their product filter and you'll see what I mean - it greys out other options as and when you pick filter options. Great functionality but at how much computing cost I wonder..
I can't thank you enough for helping me out!
Very good catch on the JOIN.
ASKER
Ahha so I've put together some answers from previous posts and came up with this (think Johann picked up on this one):
Where I'm joining for every group and can use IN() to do the ORs and the JOINs to do the ANDs..
This seems to work pretty fast - is that the holy grail I seek do you think?
SELECT fi1.fid
FROM `filter_index` fi1
JOIN `filter_index` fi2 ON (fi1.link_id=fi2.link_id)
JOIN `filter_index` fi3 ON (fi1.link_id=fi3.link_id)
JOIN `filter_index` fi4 ON (fi1.link_id=fi4.link_id)
JOIN `filter_index` fi5 ON (fi1.link_id=fi5.link_id)
WHERE fi2.fid IN (50, 25, 30)
AND fi3.fid IN (38)
AND fi4.fid IN (5)
AND fi5.fid IN (4)
GROUP BY fi1.fid
Where I'm joining for every group and can use IN() to do the ORs and the JOINs to do the ANDs..
This seems to work pretty fast - is that the holy grail I seek do you think?
Seems so given your requirement and millions of rows. Very nice work, johanntagle!
Thanks mwvisa1!
dolythgoe - yes most of the time joins do better than subqueries provided the join columns are also indexed.
dolythgoe - yes most of the time joins do better than subqueries provided the join columns are also indexed.
oh, and glad I could help! =)
ASKER
Cool thanks for that - I did an explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY fi1 index PRIMARY 6 4040187 Using where; Using index
2 DEPENDENT SUBQUERY fi2 eq_ref PRIMARY PRIMARY 6 const,rampants_dbp.fi1.lin
So it's doing a full table scan, do you think it's possible to improve on that with forcing an index?
Thanks again for your help :)