Solved

Optimisation of a select statement on 1 table using GROUP & ORDER

Posted on 2011-09-25
9
271 Views
Last Modified: 2012-06-22
Hi all,

I'm looking to get more performance out of a single query which I've built and tested with the help of guys on here over the past weeks.

So first the table and the indexes:

/*Column Information*/
----------------------

FIELD    TYPE                    COLLATION  NULL    KEY     DEFAULT  Extra   PRIVILEGES                       COMMENT
-------  ----------------------  ---------  ------  ------  -------  ------  -------------------------------  -------
link_id  INT(10) UNSIGNED        (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
word_id  MEDIUMINT(10) UNSIGNED  (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
cat_id   TINYINT(3) UNSIGNED     (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
pos      TINYINT(3) UNSIGNED     (NULL)     NO      MUL     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         

/*Index Information*/
---------------------

TABLE      Non_unique  Key_name  Seq_in_index  Column_name  COLLATION  Cardinality  Sub_part  Packed  NULL    Index_type  COMMENT
---------  ----------  --------  ------------  -----------  ---------  -----------  --------  ------  ------  ----------  -------
linkindex           0  PRIMARY              1  word_id      A               (NULL)    (NULL)  (NULL)          BTREE              
linkindex           0  PRIMARY              2  link_id      A               (NULL)    (NULL)  (NULL)          BTREE              
linkindex           0  PRIMARY              3  cat_id       A             18512312    (NULL)  (NULL)          BTREE              
linkindex           1  pos                  1  pos          A                  110    (NULL)  (NULL)          BTREE              

/*DDL Information*/
-------------------

CREATE TABLE `linkindex` (
  `link_id` INT(10) UNSIGNED NOT NULL,
  `word_id` MEDIUMINT(10) UNSIGNED NOT NULL,
  `cat_id` TINYINT(3) UNSIGNED NOT NULL,
  `pos` TINYINT(3) UNSIGNED NOT NULL,
  PRIMARY KEY (`word_id`,`link_id`,`cat_id`),
  KEY `pos` (`pos`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1

Open in new window


And a typical query used against this table:

SELECT link_id, SUM(cat_id) AS score 
FROM linkindex 
WHERE word_id IN (107, 27) 
GROUP BY link_id 
ORDER BY score DESC LIMIT 0,16;

Open in new window


Explain results from the query:

id      select_type      table      type      possible_keys      key      key_len      ref      rows      Extra
1       SIMPLE       linkindex       range       PRIMARY       PRIMARY       3       (NULL)       700079       Using where; Using index; Using temporary; Using filesort

And my.cnf file - (server has 1 core - 2 threads, 1GB memory)

max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8

#This is for filesort operations
tmp_table_size = 48M
#Same thing as tmp_table_size to the HEAP type tables
max_heap_table_size = 48M
read_buffer_size = 256K
#Needs to be fairly large as lots of GROUP BY/ORDER BY queries
sort_buffer_size = 256K
read_rnd_buffer_size = 256K
join_buffer_size = 256K
#Can't be higher than the available memory (max prob 75% memory)
max_tmp_tables = 24
#Used for sorting indexes
myisam_sort_buffer_size = 2M
key_buffer_size=64M

Open in new window


I added comments to each of the vars as I learnt about them.

Now, the biggest time chunk is 'Copying to tmp table' - I believe this is in memory otherwise it would say 'to disk' and be much slower. I increased tmp_table_size to avoid this.

The table in question has 20 million rows but all fairly small integer fields.

I read from mysql it is possible to do a query using GROUP BY on 1 table without invoking tmp tables:

Reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html

I'm not sure this can be applied but any help in improving the speed would be great.

Speeds are not so bad at the mo but they could be improved as common word_ids sometimes take 2 secs + which isn't the best.

Cheers
David

0
Comment
Question by:dolythgoe
  • 5
  • 2
  • 2
9 Comments
 
LVL 37

Assisted Solution

by:Neil Russell
Neil Russell earned 200 total points
ID: 36595534
Your BEST optimization would be to upgrade the server!
0
 

Author Comment

by:dolythgoe
ID: 36595634
That's kind of an obvious one that frames everything but I need to be sure I've done everything first regardless of server. More specifically, avoiding tmp tables if possbile with the query stated.
0
 

Author Comment

by:dolythgoe
ID: 36595643
And which bit of the server - open more cores?
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 39

Expert Comment

by:Aaron Tomosky
ID: 36596126
I think it must be paging to disk as 1gb of ram cannot possibly be enough. The os takes almost half that and 20 million rows cant be tiny
0
 

Author Comment

by:dolythgoe
ID: 36596178
Well the table of 20 million rows is not actually as big as you might think as it's int,medium int, tiny int and tiny int.  The lookup at most is 1 million rows.

It's a db slave so just has ubuntu and mysql installed.

What I've requested is an additional 2GB ram and another 3 cores to increase to 8 threads. The threads negate the need for as much memory since it can create and destroy temp tables faster in memory reducing the concurrency window that can bloat the memory. That is, if mysql v5.56 makes good use of those threads which it appears to do so.

So from the initial question - does it appear that no more query optimisation can be done and it's down to raw power? Happy with that, if that's the answer just wanted to confirm with you guys before paying for more juice!
0
 
LVL 39

Accepted Solution

by:
Aaron Tomosky earned 300 total points
ID: 36596203
Unsigned ints, with indexes on all fields.  Can't think of any way to optimize that. Query is simple with only a sum ona column. Unless you use this same query often and can just make a view off a summary table and update that table when data changes....looks clean to me.
0
 

Author Comment

by:dolythgoe
ID: 36596303
Thanks,

I just ran MySQL Tuner https://github.com/rackerhacker/MySQLTuner-perl

Which spat this out:

-------- Performance Metrics -------------------------------------------------
[--] Up for: 5m 57s (226 q [0.633 qps], 109 conn, TX: 104K, RX: 15K)
[--] Reads / Writes: 100% / 0%
[--] Total buffers: 128.0M global + 6.4M per thread (151 max threads)
[!!] Maximum possible memory usage: 1.1G (110% of installed RAM)
[OK] Slow queries: 0% (0/226)
[OK] Highest usage of available connections: 0% (1/151)
[!!] Key buffer size / total MyISAM indexes: 64.0M/685.3M
[!!] Key buffer hit rate: 94.0% (469K cached / 28K reads)
[!!] Query cache efficiency: 0.0% (0 cached / 99 selects)
[OK] Query cache prunes per day: 0
[!!] Sorts requiring temporary tables: 50% (2 temp sorts / 4 sorts)
[OK] Temporary tables created on disk: 22% (48 on disk / 212 total)
[OK] Thread cache hit rate: 99% (1 created / 109 connections)
[OK] Table cache hit rate: 23% (35 open / 147 opened)
[OK] Open file limit used: 7% (74/1K)
[OK] Table locks acquired immediately: 100% (62 immediate / 62 locks)

-------- Recommendations -----------------------------------------------------
General recommendations:
    MySQL started within last 24 hours - recommendations may be inaccurate
    Reduce your overall MySQL memory footprint for system stability
    Enable the slow query log to troubleshoot bad queries
Variables to adjust:
  *** MySQL's maximum memory usage is dangerously high ***
  *** Add RAM before increasing MySQL buffer variables ***
    key_buffer_size (> 685.3M)
    query_cache_limit (> 1M, or use smaller result sets)
    sort_buffer_size (> 2M)
    read_rnd_buffer_size (> 2M)

Open in new window


Looks like it needs more memory! A handy tool.

0
 
LVL 37

Expert Comment

by:Neil Russell
ID: 36598178
so like I said the optimization you need is a server upgrade. Your query is as simple as it gets nearly
0
 

Author Closing Comment

by:dolythgoe
ID: 36598736
Thanks for confirming this guys.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where LOAD_FILE (http://dev.mysql.com/doc/refm…
More Fun with XML and MySQL – Parsing Delimited String with a Single SQL Statement Are you ready for another of my SQL tidbits?  Hopefully so, as in this adventure, I will be covering a topic that comes up a lot which is parsing a comma (or other…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question