I'm really trying to squeeze speed and optimise where possible out of the search index I'm building.
I have a php worker script which looks for tags in a textdump and sorts them into a searchable index based on word_id's and link_id's.
I've setup the table link this:
CREATE TABLE IF NOT EXISTS `linkindex_tags` (
`link_id` int(10) unsigned NOT NULL,
`word_id` mediumint(10) unsigned NOT NULL,
PRIMARY KEY (`link_id`,`word_id`),
KEY `word_id` (`word_id`),
KEY `link_id` (`link_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Now, I'm using the below SQL query which takes a series of word_id's and orders them based on number of matches in the above table .
WHERE word_id <= 4129
GROUP BY link_id
SUM( word_id IN ('56, 65,4129') )
DESC LIMIT 15
I've added some indexes and assigned a fair bit of resource from the server.
Here's the results of this query using 2 word_id's (using SQL_NO_CACHE) on a table with 7.9 million rows - this table will likely get to 30+ million.
(15 total, Query took 3.0226 sec)
Server: CentOs 64bit - 8GB RAM Intel(R) Xeon(R) CPU X3470 @ 2.93GHz (x8)
Now, ideally need this down to 0.5 or less and it's looking like it might fall off a cliff if more and more rows are added.
Does anyone have any experience in this type of index and how to make it as rapid as possible. It's hard to think how I would split the tables up into any logical partioning as the word_id's can be 1 to 99,000 and any one of them be related to the link id.
Help is much appreciated to help me take it that extra mile :)
Edited - here's the EXPLAIN:
Generation Time: Sep 08, 2011 at 10:27 PM
Generated by: phpMyAdmin 3.4.4 / MySQL 5.1.56
SQL query: EXPLAIN SELECT SQL_NO_CACHE link_id FROM linkindex_tags WHERE word_id IN ('107','27') GROUP BY link_id ORDER BY SUM( word_id IN ('107','27') ) DESC LIMIT 15;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE linkindex_tags index word_id PRIMARY 7 NULL 7958382 Using where; Using index; Using temporary; Using filesort