Solved

Optimisation of a select statement on 1 table using GROUP & ORDER

Posted on 2011-09-25
9
268 Views
Last Modified: 2012-06-22
Hi all,

I'm looking to get more performance out of a single query which I've built and tested with the help of guys on here over the past weeks.

So first the table and the indexes:

/*Column Information*/
----------------------

FIELD    TYPE                    COLLATION  NULL    KEY     DEFAULT  Extra   PRIVILEGES                       COMMENT
-------  ----------------------  ---------  ------  ------  -------  ------  -------------------------------  -------
link_id  INT(10) UNSIGNED        (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
word_id  MEDIUMINT(10) UNSIGNED  (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
cat_id   TINYINT(3) UNSIGNED     (NULL)     NO      PRI     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         
pos      TINYINT(3) UNSIGNED     (NULL)     NO      MUL     (NULL)           SELECT,INSERT,UPDATE,REFERENCES         

/*Index Information*/
---------------------

TABLE      Non_unique  Key_name  Seq_in_index  Column_name  COLLATION  Cardinality  Sub_part  Packed  NULL    Index_type  COMMENT
---------  ----------  --------  ------------  -----------  ---------  -----------  --------  ------  ------  ----------  -------
linkindex           0  PRIMARY              1  word_id      A               (NULL)    (NULL)  (NULL)          BTREE              
linkindex           0  PRIMARY              2  link_id      A               (NULL)    (NULL)  (NULL)          BTREE              
linkindex           0  PRIMARY              3  cat_id       A             18512312    (NULL)  (NULL)          BTREE              
linkindex           1  pos                  1  pos          A                  110    (NULL)  (NULL)          BTREE              

/*DDL Information*/
-------------------

CREATE TABLE `linkindex` (
  `link_id` INT(10) UNSIGNED NOT NULL,
  `word_id` MEDIUMINT(10) UNSIGNED NOT NULL,
  `cat_id` TINYINT(3) UNSIGNED NOT NULL,
  `pos` TINYINT(3) UNSIGNED NOT NULL,
  PRIMARY KEY (`word_id`,`link_id`,`cat_id`),
  KEY `pos` (`pos`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1

Open in new window


And a typical query used against this table:

SELECT link_id, SUM(cat_id) AS score 
FROM linkindex 
WHERE word_id IN (107, 27) 
GROUP BY link_id 
ORDER BY score DESC LIMIT 0,16;

Open in new window


Explain results from the query:

id      select_type      table      type      possible_keys      key      key_len      ref      rows      Extra
1       SIMPLE       linkindex       range       PRIMARY       PRIMARY       3       (NULL)       700079       Using where; Using index; Using temporary; Using filesort

And my.cnf file - (server has 1 core - 2 threads, 1GB memory)

max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8

#This is for filesort operations
tmp_table_size = 48M
#Same thing as tmp_table_size to the HEAP type tables
max_heap_table_size = 48M
read_buffer_size = 256K
#Needs to be fairly large as lots of GROUP BY/ORDER BY queries
sort_buffer_size = 256K
read_rnd_buffer_size = 256K
join_buffer_size = 256K
#Can't be higher than the available memory (max prob 75% memory)
max_tmp_tables = 24
#Used for sorting indexes
myisam_sort_buffer_size = 2M
key_buffer_size=64M

Open in new window


I added comments to each of the vars as I learnt about them.

Now, the biggest time chunk is 'Copying to tmp table' - I believe this is in memory otherwise it would say 'to disk' and be much slower. I increased tmp_table_size to avoid this.

The table in question has 20 million rows but all fairly small integer fields.

I read from mysql it is possible to do a query using GROUP BY on 1 table without invoking tmp tables:

Reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html

I'm not sure this can be applied but any help in improving the speed would be great.

Speeds are not so bad at the mo but they could be improved as common word_ids sometimes take 2 secs + which isn't the best.

Cheers
David

0
Comment
Question by:dolythgoe
  • 5
  • 2
  • 2
9 Comments
 
LVL 37

Assisted Solution

by:Neil Russell
Neil Russell earned 200 total points
Comment Utility
Your BEST optimization would be to upgrade the server!
0
 

Author Comment

by:dolythgoe
Comment Utility
That's kind of an obvious one that frames everything but I need to be sure I've done everything first regardless of server. More specifically, avoiding tmp tables if possbile with the query stated.
0
 

Author Comment

by:dolythgoe
Comment Utility
And which bit of the server - open more cores?
0
 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
I think it must be paging to disk as 1gb of ram cannot possibly be enough. The os takes almost half that and 20 million rows cant be tiny
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 

Author Comment

by:dolythgoe
Comment Utility
Well the table of 20 million rows is not actually as big as you might think as it's int,medium int, tiny int and tiny int.  The lookup at most is 1 million rows.

It's a db slave so just has ubuntu and mysql installed.

What I've requested is an additional 2GB ram and another 3 cores to increase to 8 threads. The threads negate the need for as much memory since it can create and destroy temp tables faster in memory reducing the concurrency window that can bloat the memory. That is, if mysql v5.56 makes good use of those threads which it appears to do so.

So from the initial question - does it appear that no more query optimisation can be done and it's down to raw power? Happy with that, if that's the answer just wanted to confirm with you guys before paying for more juice!
0
 
LVL 38

Accepted Solution

by:
Aaron Tomosky earned 300 total points
Comment Utility
Unsigned ints, with indexes on all fields.  Can't think of any way to optimize that. Query is simple with only a sum ona column. Unless you use this same query often and can just make a view off a summary table and update that table when data changes....looks clean to me.
0
 

Author Comment

by:dolythgoe
Comment Utility
Thanks,

I just ran MySQL Tuner https://github.com/rackerhacker/MySQLTuner-perl

Which spat this out:

-------- Performance Metrics -------------------------------------------------
[--] Up for: 5m 57s (226 q [0.633 qps], 109 conn, TX: 104K, RX: 15K)
[--] Reads / Writes: 100% / 0%
[--] Total buffers: 128.0M global + 6.4M per thread (151 max threads)
[!!] Maximum possible memory usage: 1.1G (110% of installed RAM)
[OK] Slow queries: 0% (0/226)
[OK] Highest usage of available connections: 0% (1/151)
[!!] Key buffer size / total MyISAM indexes: 64.0M/685.3M
[!!] Key buffer hit rate: 94.0% (469K cached / 28K reads)
[!!] Query cache efficiency: 0.0% (0 cached / 99 selects)
[OK] Query cache prunes per day: 0
[!!] Sorts requiring temporary tables: 50% (2 temp sorts / 4 sorts)
[OK] Temporary tables created on disk: 22% (48 on disk / 212 total)
[OK] Thread cache hit rate: 99% (1 created / 109 connections)
[OK] Table cache hit rate: 23% (35 open / 147 opened)
[OK] Open file limit used: 7% (74/1K)
[OK] Table locks acquired immediately: 100% (62 immediate / 62 locks)

-------- Recommendations -----------------------------------------------------
General recommendations:
    MySQL started within last 24 hours - recommendations may be inaccurate
    Reduce your overall MySQL memory footprint for system stability
    Enable the slow query log to troubleshoot bad queries
Variables to adjust:
  *** MySQL's maximum memory usage is dangerously high ***
  *** Add RAM before increasing MySQL buffer variables ***
    key_buffer_size (> 685.3M)
    query_cache_limit (> 1M, or use smaller result sets)
    sort_buffer_size (> 2M)
    read_rnd_buffer_size (> 2M)

Open in new window


Looks like it needs more memory! A handy tool.

0
 
LVL 37

Expert Comment

by:Neil Russell
Comment Utility
so like I said the optimization you need is a server upgrade. Your query is as simple as it gets nearly
0
 

Author Closing Comment

by:dolythgoe
Comment Utility
Thanks for confirming this guys.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Foreword In the years since this article was written, numerous hacking attacks have targeted password-protected web sites.  The storage of client passwords has become a subject of much discussion, some of it useful and some of it misguided.  Of cou…
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now