Solved

Speeding up a single join

Posted on 2011-09-16
5
279 Views
Last Modified: 2012-05-12
Hi all,

I never seem to have any luck with JOINS, they just end up trashing the query speed so looking for some advice on getting this one well under the 2.3 seconds it is currently working at.

Here's the 2 tables and query I'm running:

CREATE TABLE IF NOT EXISTS `linkindex` (
  `link_id` int(10) unsigned NOT NULL,
  `word_id` mediumint(10) unsigned NOT NULL,
  `cat_id` tinyint(3) unsigned NOT NULL,
  `pos` tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (`word_id`,`link_id`,`cat_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `links_date_duration` (
  `link_id` int(10) unsigned NOT NULL,
  `link_duration` mediumint(9) NOT NULL,
  `link_date` date NOT NULL,
  PRIMARY KEY (`link_id`),
  KEY `link_duration` (`link_duration`),
  KEY `link_date` (`link_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

EXPLAIN SELECT li.link_id, ldd.link_duration, SUM( cat_id ) AS score
FROM linkindex li
INNER JOIN links_date_duration ldd ON li.link_id = ldd.link_id
WHERE word_id
IN ( 103, 46, 27, 261 )
GROUP BY link_id
HAVING score >10
ORDER BY link_duration DESC

Open in new window


id       select_type       table       type       possible_keys       key       key_len       ref       rows       Extra
1       SIMPLE       li       range       PRIMARY       PRIMARY       3       NULL      605031       Using where; Using index; Using temporary; Using filesort
1       SIMPLE       ldd       eq_ref       PRIMARY       PRIMARY       4       db_db.li.link_id       1       


Any suggestions most welcome!

The query finds matches on word_ids and then orders by dates.
0
Comment
Question by:dolythgoe
  • 2
  • 2
5 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 36547989
first, you should put table aliases for all used colums in your query, makes it much more readable....

SELECT li.link_id, ldd.link_duration, SUM( li.cat_id ) AS score
FROM linkindex li
INNER JOIN links_date_duration ldd 
  ON li.link_id = ldd.link_id
WHERE li.word_id IN ( 103, 46, 27, 261 )
GROUP BY li.link_id
HAVING score > 10
ORDER BY ldd.link_duration DESC

Open in new window


to optimize, you might want to fetch the link_duration value after the group by, aka
SELECT sq.link_id
     , ldd.link_duration
     , sq.score
  FROM ( SELECT li.link_id, SUM( li.cat_id ) AS score
           FROM linkindex li
          WHERE li.word_id IN ( 103, 46, 27, 261 )
          GROUP BY li.link_id
         HAVING score > 10
       )
  JOIN links_date_duration ldd 
    ON li.link_id = ldd.link_id
ORDER BY ldd.link_duration DESC

Open in new window

0
 
LVL 17

Expert Comment

by:Garry-G
ID: 36547996
Not really a suggestion as to the optimization of this select, but from many years of application programming experience (then Informix and nowhere near nowadays' system performance) - just because you CAN do everything in one select doesn't mean you should ... often, the overall performance for a certain job can be improved by using not the AI of the SQL engine, but the NI (natural intelligence) of a programmer ... especially if you can reduce the amount of data queried in one step and then only go through a small subset of the secondary data ...
Can you give a rough estimate of number of data sets in the original tables and the results of the query?
0
 

Author Comment

by:dolythgoe
ID: 36548071
Thanks Guys,

a3 - I recieved this error running that query:

 #1248 - Every derived table must have its own alias

Gary, I agree and have avoided joins on the relevancy query with good results.

Data in linkindex is 20 million rows and the links_date_duration has 1.6 million.

Here's the output in 2.4

link_id       link_duration       score
921498       8649       12
891347       5959       15
938710       5449       12
783048       5127       11
988388       5067       12
926092       5052       15
1143211       4792       12
969120       4444       15
674648       3992       11
870280       3826       15
1399579       3703       12
999183       3625       12
979063       3598       15
1208358       3536       12
900730       3515       12
983292       3480       12
0
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 36548077
sorry:
SELECT sq.link_id
     , ldd.link_duration
     , sq.score
  FROM ( SELECT li.link_id, SUM( li.cat_id ) AS score
           FROM linkindex li
          WHERE li.word_id IN ( 103, 46, 27, 261 )
          GROUP BY li.link_id
         HAVING score > 10
       ) sq
  JOIN links_date_duration ldd 
    ON li.link_id = ldd.link_id
ORDER BY ldd.link_duration DESC

Open in new window

0
 

Author Closing Comment

by:dolythgoe
ID: 36548141
Good work! 0.7 seconds is much better thanks!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction In this article, I will by showing a nice little trick for MySQL similar to that of my previous EE Article for SQLite (http://www.sqlite.org/), A SQLite Tidbit: Quick Numbers Table Generation (http://www.experts-exchange.com/A_3570.htm…
Foreword This is an old article.  Instead of using the MySQL extension that was used in the original code examples, please choose one of the currently supported database extensions instead.  More information is available here: MySQLi / PDO (http://…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now