Solved

Speeding up a single join

Posted on 2011-09-16
5
281 Views
Last Modified: 2012-05-12
Hi all,

I never seem to have any luck with JOINS, they just end up trashing the query speed so looking for some advice on getting this one well under the 2.3 seconds it is currently working at.

Here's the 2 tables and query I'm running:

CREATE TABLE IF NOT EXISTS `linkindex` (
  `link_id` int(10) unsigned NOT NULL,
  `word_id` mediumint(10) unsigned NOT NULL,
  `cat_id` tinyint(3) unsigned NOT NULL,
  `pos` tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (`word_id`,`link_id`,`cat_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `links_date_duration` (
  `link_id` int(10) unsigned NOT NULL,
  `link_duration` mediumint(9) NOT NULL,
  `link_date` date NOT NULL,
  PRIMARY KEY (`link_id`),
  KEY `link_duration` (`link_duration`),
  KEY `link_date` (`link_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

EXPLAIN SELECT li.link_id, ldd.link_duration, SUM( cat_id ) AS score
FROM linkindex li
INNER JOIN links_date_duration ldd ON li.link_id = ldd.link_id
WHERE word_id
IN ( 103, 46, 27, 261 )
GROUP BY link_id
HAVING score >10
ORDER BY link_duration DESC

Open in new window


id       select_type       table       type       possible_keys       key       key_len       ref       rows       Extra
1       SIMPLE       li       range       PRIMARY       PRIMARY       3       NULL      605031       Using where; Using index; Using temporary; Using filesort
1       SIMPLE       ldd       eq_ref       PRIMARY       PRIMARY       4       db_db.li.link_id       1       


Any suggestions most welcome!

The query finds matches on word_ids and then orders by dates.
0
Comment
Question by:dolythgoe
  • 2
  • 2
5 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 36547989
first, you should put table aliases for all used colums in your query, makes it much more readable....

SELECT li.link_id, ldd.link_duration, SUM( li.cat_id ) AS score
FROM linkindex li
INNER JOIN links_date_duration ldd 
  ON li.link_id = ldd.link_id
WHERE li.word_id IN ( 103, 46, 27, 261 )
GROUP BY li.link_id
HAVING score > 10
ORDER BY ldd.link_duration DESC

Open in new window


to optimize, you might want to fetch the link_duration value after the group by, aka
SELECT sq.link_id
     , ldd.link_duration
     , sq.score
  FROM ( SELECT li.link_id, SUM( li.cat_id ) AS score
           FROM linkindex li
          WHERE li.word_id IN ( 103, 46, 27, 261 )
          GROUP BY li.link_id
         HAVING score > 10
       )
  JOIN links_date_duration ldd 
    ON li.link_id = ldd.link_id
ORDER BY ldd.link_duration DESC

Open in new window

0
 
LVL 17

Expert Comment

by:Garry-G
ID: 36547996
Not really a suggestion as to the optimization of this select, but from many years of application programming experience (then Informix and nowhere near nowadays' system performance) - just because you CAN do everything in one select doesn't mean you should ... often, the overall performance for a certain job can be improved by using not the AI of the SQL engine, but the NI (natural intelligence) of a programmer ... especially if you can reduce the amount of data queried in one step and then only go through a small subset of the secondary data ...
Can you give a rough estimate of number of data sets in the original tables and the results of the query?
0
 

Author Comment

by:dolythgoe
ID: 36548071
Thanks Guys,

a3 - I recieved this error running that query:

 #1248 - Every derived table must have its own alias

Gary, I agree and have avoided joins on the relevancy query with good results.

Data in linkindex is 20 million rows and the links_date_duration has 1.6 million.

Here's the output in 2.4

link_id       link_duration       score
921498       8649       12
891347       5959       15
938710       5449       12
783048       5127       11
988388       5067       12
926092       5052       15
1143211       4792       12
969120       4444       15
674648       3992       11
870280       3826       15
1399579       3703       12
999183       3625       12
979063       3598       15
1208358       3536       12
900730       3515       12
983292       3480       12
0
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 36548077
sorry:
SELECT sq.link_id
     , ldd.link_duration
     , sq.score
  FROM ( SELECT li.link_id, SUM( li.cat_id ) AS score
           FROM linkindex li
          WHERE li.word_id IN ( 103, 46, 27, 261 )
          GROUP BY li.link_id
         HAVING score > 10
       ) sq
  JOIN links_date_duration ldd 
    ON li.link_id = ldd.link_id
ORDER BY ldd.link_duration DESC

Open in new window

0
 

Author Closing Comment

by:dolythgoe
ID: 36548141
Good work! 0.7 seconds is much better thanks!
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

All XML, All the Time; More Fun MySQL Tidbits – Dynamically Generate XML via Stored Procedure in MySQL Extensible Markup Language (XML) and database systems, a marriage we are seeing more and more of.  So the topics of parsing and manipulating XM…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question