Solved

Optimisation of a select statement on 1 table using GROUP & ORDER

Posted on 2011-09-25
9
270 Views
Last Modified: 2012-06-22
Hi all,

I'm looking to get more performance out of a single query which I've built and tested with the help of guys on here over the past weeks.

So first the table and the indexes:

/*Column Information*/
----------------------

FIELD    TYPE                    COLLATION  NULL    KEY     DEFAULT  Extra   PRIVILEGES                       COMMENT
-------  ----------------------  ---------  ------  ------  -------  ------  -------------------------------  -------
link_id  INT(10) UNSIGNED        (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
word_id  MEDIUMINT(10) UNSIGNED  (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
cat_id   TINYINT(3) UNSIGNED     (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
pos      TINYINT(3) UNSIGNED     (NULL)     NO      MUL     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         

/*Index Information*/
---------------------

TABLE      Non_unique  Key_name  Seq_in_index  Column_name  COLLATION  Cardinality  Sub_part  Packed  NULL    Index_type  COMMENT
---------  ----------  --------  ------------  -----------  ---------  -----------  --------  ------  ------  ----------  -------
linkindex           0  PRIMARY              1  word_id      A               (NULL)    (NULL)  (NULL)          BTREE              
linkindex           0  PRIMARY              2  link_id      A               (NULL)    (NULL)  (NULL)          BTREE              
linkindex           0  PRIMARY              3  cat_id       A             18512312    (NULL)  (NULL)          BTREE              
linkindex           1  pos                  1  pos          A                  110    (NULL)  (NULL)          BTREE              

/*DDL Information*/
-------------------

CREATE TABLE `linkindex` (
  `link_id` INT(10) UNSIGNED NOT NULL,
  `word_id` MEDIUMINT(10) UNSIGNED NOT NULL,
  `cat_id` TINYINT(3) UNSIGNED NOT NULL,
  `pos` TINYINT(3) UNSIGNED NOT NULL,
  PRIMARY KEY (`word_id`,`link_id`,`cat_id`),
  KEY `pos` (`pos`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1

Open in new window


And a typical query used against this table:

SELECT link_id, SUM(cat_id) AS score 
FROM linkindex 
WHERE word_id IN (107, 27) 
GROUP BY link_id 
ORDER BY score DESC LIMIT 0,16;

Open in new window


Explain results from the query:

id      select_type      table      type      possible_keys      key      key_len      ref      rows      Extra
1       SIMPLE       linkindex       range       PRIMARY       PRIMARY       3       (NULL)       700079       Using where; Using index; Using temporary; Using filesort

And my.cnf file - (server has 1 core - 2 threads, 1GB memory)

max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8

#This is for filesort operations
tmp_table_size = 48M
#Same thing as tmp_table_size to the HEAP type tables
max_heap_table_size = 48M
read_buffer_size = 256K
#Needs to be fairly large as lots of GROUP BY/ORDER BY queries
sort_buffer_size = 256K
read_rnd_buffer_size = 256K
join_buffer_size = 256K
#Can't be higher than the available memory (max prob 75% memory)
max_tmp_tables = 24
#Used for sorting indexes
myisam_sort_buffer_size = 2M
key_buffer_size=64M

Open in new window


I added comments to each of the vars as I learnt about them.

Now, the biggest time chunk is 'Copying to tmp table' - I believe this is in memory otherwise it would say 'to disk' and be much slower. I increased tmp_table_size to avoid this.

The table in question has 20 million rows but all fairly small integer fields.

I read from mysql it is possible to do a query using GROUP BY on 1 table without invoking tmp tables:

Reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html

I'm not sure this can be applied but any help in improving the speed would be great.

Speeds are not so bad at the mo but they could be improved as common word_ids sometimes take 2 secs + which isn't the best.

Cheers
David

0
Comment
Question by:dolythgoe
  • 5
  • 2
  • 2
9 Comments
 
LVL 37

Assisted Solution

by:Neil Russell
Neil Russell earned 200 total points
ID: 36595534
Your BEST optimization would be to upgrade the server!
0
 

Author Comment

by:dolythgoe
ID: 36595634
That's kind of an obvious one that frames everything but I need to be sure I've done everything first regardless of server. More specifically, avoiding tmp tables if possbile with the query stated.
0
 

Author Comment

by:dolythgoe
ID: 36595643
And which bit of the server - open more cores?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 36596126
I think it must be paging to disk as 1gb of ram cannot possibly be enough. The os takes almost half that and 20 million rows cant be tiny
0
 

Author Comment

by:dolythgoe
ID: 36596178
Well the table of 20 million rows is not actually as big as you might think as it's int,medium int, tiny int and tiny int.  The lookup at most is 1 million rows.

It's a db slave so just has ubuntu and mysql installed.

What I've requested is an additional 2GB ram and another 3 cores to increase to 8 threads. The threads negate the need for as much memory since it can create and destroy temp tables faster in memory reducing the concurrency window that can bloat the memory. That is, if mysql v5.56 makes good use of those threads which it appears to do so.

So from the initial question - does it appear that no more query optimisation can be done and it's down to raw power? Happy with that, if that's the answer just wanted to confirm with you guys before paying for more juice!
0
 
LVL 38

Accepted Solution

by:
Aaron Tomosky earned 300 total points
ID: 36596203
Unsigned ints, with indexes on all fields.  Can't think of any way to optimize that. Query is simple with only a sum ona column. Unless you use this same query often and can just make a view off a summary table and update that table when data changes....looks clean to me.
0
 

Author Comment

by:dolythgoe
ID: 36596303
Thanks,

I just ran MySQL Tuner https://github.com/rackerhacker/MySQLTuner-perl

Which spat this out:

-------- Performance Metrics -------------------------------------------------
[--] Up for: 5m 57s (226 q [0.633 qps], 109 conn, TX: 104K, RX: 15K)
[--] Reads / Writes: 100% / 0%
[--] Total buffers: 128.0M global + 6.4M per thread (151 max threads)
[!!] Maximum possible memory usage: 1.1G (110% of installed RAM)
[OK] Slow queries: 0% (0/226)
[OK] Highest usage of available connections: 0% (1/151)
[!!] Key buffer size / total MyISAM indexes: 64.0M/685.3M
[!!] Key buffer hit rate: 94.0% (469K cached / 28K reads)
[!!] Query cache efficiency: 0.0% (0 cached / 99 selects)
[OK] Query cache prunes per day: 0
[!!] Sorts requiring temporary tables: 50% (2 temp sorts / 4 sorts)
[OK] Temporary tables created on disk: 22% (48 on disk / 212 total)
[OK] Thread cache hit rate: 99% (1 created / 109 connections)
[OK] Table cache hit rate: 23% (35 open / 147 opened)
[OK] Open file limit used: 7% (74/1K)
[OK] Table locks acquired immediately: 100% (62 immediate / 62 locks)

-------- Recommendations -----------------------------------------------------
General recommendations:
    MySQL started within last 24 hours - recommendations may be inaccurate
    Reduce your overall MySQL memory footprint for system stability
    Enable the slow query log to troubleshoot bad queries
Variables to adjust:
  *** MySQL's maximum memory usage is dangerously high ***
  *** Add RAM before increasing MySQL buffer variables ***
    key_buffer_size (> 685.3M)
    query_cache_limit (> 1M, or use smaller result sets)
    sort_buffer_size (> 2M)
    read_rnd_buffer_size (> 2M)

Open in new window


Looks like it needs more memory! A handy tool.

0
 
LVL 37

Expert Comment

by:Neil Russell
ID: 36598178
so like I said the optimization you need is a server upgrade. Your query is as simple as it gets nearly
0
 

Author Closing Comment

by:dolythgoe
ID: 36598736
Thanks for confirming this guys.
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As a database administrator, you may need to audit your table(s) to determine whether the data types are optimal for your real-world data needs.  This Article is intended to be a resource for such a task. Preface The other day, I was involved …
Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikipedia.org/wiki/PHP  Very powerful.  But a…
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now