[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Speeding up a single join

Posted on 2011-09-16
5
Medium Priority
?
289 Views
Last Modified: 2012-05-12
Hi all,

I never seem to have any luck with JOINS, they just end up trashing the query speed so looking for some advice on getting this one well under the 2.3 seconds it is currently working at.

Here's the 2 tables and query I'm running:

CREATE TABLE IF NOT EXISTS `linkindex` (
  `link_id` int(10) unsigned NOT NULL,
  `word_id` mediumint(10) unsigned NOT NULL,
  `cat_id` tinyint(3) unsigned NOT NULL,
  `pos` tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (`word_id`,`link_id`,`cat_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `links_date_duration` (
  `link_id` int(10) unsigned NOT NULL,
  `link_duration` mediumint(9) NOT NULL,
  `link_date` date NOT NULL,
  PRIMARY KEY (`link_id`),
  KEY `link_duration` (`link_duration`),
  KEY `link_date` (`link_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

EXPLAIN SELECT li.link_id, ldd.link_duration, SUM( cat_id ) AS score
FROM linkindex li
INNER JOIN links_date_duration ldd ON li.link_id = ldd.link_id
WHERE word_id
IN ( 103, 46, 27, 261 )
GROUP BY link_id
HAVING score >10
ORDER BY link_duration DESC

Open in new window


id       select_type       table       type       possible_keys       key       key_len       ref       rows       Extra
1       SIMPLE       li       range       PRIMARY       PRIMARY       3       NULL      605031       Using where; Using index; Using temporary; Using filesort
1       SIMPLE       ldd       eq_ref       PRIMARY       PRIMARY       4       db_db.li.link_id       1       


Any suggestions most welcome!

The query finds matches on word_ids and then orders by dates.
0
Comment
Question by:dolythgoe
  • 2
  • 2
5 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 36547989
first, you should put table aliases for all used colums in your query, makes it much more readable....

SELECT li.link_id, ldd.link_duration, SUM( li.cat_id ) AS score
FROM linkindex li
INNER JOIN links_date_duration ldd 
  ON li.link_id = ldd.link_id
WHERE li.word_id IN ( 103, 46, 27, 261 )
GROUP BY li.link_id
HAVING score > 10
ORDER BY ldd.link_duration DESC

Open in new window


to optimize, you might want to fetch the link_duration value after the group by, aka
SELECT sq.link_id
     , ldd.link_duration
     , sq.score
  FROM ( SELECT li.link_id, SUM( li.cat_id ) AS score
           FROM linkindex li
          WHERE li.word_id IN ( 103, 46, 27, 261 )
          GROUP BY li.link_id
         HAVING score > 10
       )
  JOIN links_date_duration ldd 
    ON li.link_id = ldd.link_id
ORDER BY ldd.link_duration DESC

Open in new window

0
 
LVL 18

Expert Comment

by:Garry Glendown
ID: 36547996
Not really a suggestion as to the optimization of this select, but from many years of application programming experience (then Informix and nowhere near nowadays' system performance) - just because you CAN do everything in one select doesn't mean you should ... often, the overall performance for a certain job can be improved by using not the AI of the SQL engine, but the NI (natural intelligence) of a programmer ... especially if you can reduce the amount of data queried in one step and then only go through a small subset of the secondary data ...
Can you give a rough estimate of number of data sets in the original tables and the results of the query?
0
 

Author Comment

by:dolythgoe
ID: 36548071
Thanks Guys,

a3 - I recieved this error running that query:

 #1248 - Every derived table must have its own alias

Gary, I agree and have avoided joins on the relevancy query with good results.

Data in linkindex is 20 million rows and the links_date_duration has 1.6 million.

Here's the output in 2.4

link_id       link_duration       score
921498       8649       12
891347       5959       15
938710       5449       12
783048       5127       11
988388       5067       12
926092       5052       15
1143211       4792       12
969120       4444       15
674648       3992       11
870280       3826       15
1399579       3703       12
999183       3625       12
979063       3598       15
1208358       3536       12
900730       3515       12
983292       3480       12
0
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 2000 total points
ID: 36548077
sorry:
SELECT sq.link_id
     , ldd.link_duration
     , sq.score
  FROM ( SELECT li.link_id, SUM( li.cat_id ) AS score
           FROM linkindex li
          WHERE li.word_id IN ( 103, 46, 27, 261 )
          GROUP BY li.link_id
         HAVING score > 10
       ) sq
  JOIN links_date_duration ldd 
    ON li.link_id = ldd.link_id
ORDER BY ldd.link_duration DESC

Open in new window

0
 

Author Closing Comment

by:dolythgoe
ID: 36548141
Good work! 0.7 seconds is much better thanks!
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses
Course of the Month17 days, 21 hours left to enroll

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question