Link to home
Start Free TrialLog in
Avatar of lfmaleon
lfmaleon

asked on

Performance problem with our new database server

While doing initial testing of for an upcoming database server migration, we were surprised to find that the “new”, much better, machine is underperforming compared to the “old”, production, environment.     Specifically, we found that ALTER operations run significantly slower in the new.  A 1.9Gb table with 25407229 records took 2m 15s to complete an ALTER query ( ALTER TABLE _tmp_table ENGINE=InnoDB; ) in the “old” machine, while the same query took 7m 38s in the “new” machine.  Tests were performed when both machines experienced minimal load.     We appreciate any information or ideas that could explain (and solve) the observed underperformance.    

Thanks    

“Old” server specifications:
Dell PowerEdge R920
Processor: Cores (40) - Intel(R) Xeon(R) CPU E7-8891 v2 @ 3.20GHz
Ubuntu 14.04.1 LTS  Percona 5.6.22
RAM: 1511 GB
HDD: 598.9 GB + 1199.1 GB

“New” server specifications:
Dell PowerEdge R640
Processor: Cores (32) - Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz
Ubuntu 18.04.3 LTS  Percona 5.7.28  
RAM: 1510 GB
SSD: 3.5 TB  


/etc/mysql/percona-server.conf.d/mysqld.cnf

skip-name-resolve
back_log = 1500
innodb_adaptive_hash_index = 0
innodb_buffer_pool_instances = 64
innodb_buffer_pool_size = 710G
innodb_flush_method = O_DIRECT
innodb_flush_neighbors = 0
innodb_io_capacity = 15000
innodb_io_capacity_max = 25000
innodb_lru_scan_depth = 8192
innodb_open_files = 2000
innodb_page_cleaners = 8
join_buffer_size = 600M
key_buffer_size = 6000M
max_allowed_packet = 2000M
max_connections = 500
max_heap_table_size = 20000M
open_files_limit = 5000
query_cache_limit = 6G
query_cache_size = 0
query_cache_type = 0
read_rnd_buffer_size = 32M
slave_net_timeout = 3600
sort_buffer_size = 32M
table_definition_cache = 1000
table_open_cache = 8000
table_open_cache_instance = 64
thread_cache_size = 32
thread_stack = 1M
tmp_table_size = 20000M
innodb_log_file_size = 48M
Avatar of John Tsioumpris
John Tsioumpris
Flag of Greece image

On a similar case disabling HyperThreading on the new server resolved this...
Avatar of noci
noci

SSD is faster then magnetic disk (0 seek time), multiple disks might be faster still. due to transfer speeds.
2 disks have two channels ==> twice the bandwidth. (although there is more here  in storage controller doing caching or not (only with battery backup should be considered).
If you had problem with seek time then SSD will help a lot, otherwise a little.

How about CPU load.. between the old & new systems.   If the old system was near full load then the newer system is less capable.
(40 * 3.2 ) > (32 * 3.6)

iotop might help verifying on IO issues.
Like top can for memory & CPU load.

(added:)
Seeing the amount of RAM and the previous disk size, you should be able to run the whole database from RAM once it is loaded.
you may need to tune memory parameters for that.  It will cut back severely on read IO, so if working correctly you should almost only see write IO's
Along the lines of noci's comment.

This much difference is almost certainly related to your disk hardware or disk config.

Keep in mind most SSD drives are generally much slower than mechanical drives writing + much faster than mechanical drives reading, unless you're using a large M.2 cache on each SSD drive.

Having only 1.5G of memory will also likely break any sort of caching, because this memory size is far to low for any... even minimal MariaDB/MySQL workflow pattern.

Running an ALTER test which constantly breaks out of caching will be very slow with SSD drives.

My guess is this is a VM, which you've set to use 1.5G memory. Try this...

1) Bump up your memory to 32G or 64.

2) Put your your ALTER test in a script + repeatedly run the script for 24 hours.

3) Then run mysqltuner + implement all tunings suggested.

4) Repeat #2 + #3, until mysqltuner produces quiet output (no more suggested tunings).

5) At this point run a timing check of your ALTER test.

You may find additional memory fixes the problem.

Another consideration will be if this type of workflow pattern occurs on a regular basis. If it does, the likely InnoDB is the wrong storage engine, as InnoDB is optimized for reads + writes. If you have a write heavy workflow, where ALTER/INSERT/DELETE/UPDATE occur far more frequently then SELECT, then likely best to consider switching to the MyRocks storage engine which is optimized for heavy write workflows.

Another consideration will be how you're doing your ALTER.

For example, dropping any column index, doing an alter, adding back index may be much faster than trying to do an ALTER with an index(es) in place. Another consideration will be if your ALTER is wrapped in a TRANSACTION or not, which will greatly effect speed. Another consideration is if this VM (if it is a VM) has constant heavy I/O, which suggests you should use bare metal tech like LXD, rather than a VM which produces slow I/O which is dependent on all machine I/O (host level + all guest VMs).
@David he is talking about: 1510 GB
1.5 GB not even the cheapest smartphone carries now days......even back on 2000 the smallest server had 2 GB
How is the storage configured,i.e.:  RAID, partitions
Avatar of lfmaleon

ASKER

“New” server specifications:
Dell PowerEdge R640
Processor: Cores (32) - Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz
Ubuntu 18.04.3 LTS  Percona 5.7.28  
RAM: 1510 GB
SSD: 3.5 TB
Raid10 Online RAID-10 3575.75 GB SSD
RAID 10 on how many spindles: 4, 6, 8, 10, ...  More "spindles" ==> more bandwidth.
@John, thanks... I did misread this...

And 1.5T of RAM memory seems very large.

If this really is the amount of RAM being used, running mysqltuner will provide optimal tunings which can easily be implemented with a large amount of RAM like this.

So mysqltuner will be a good starting point.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.