Solved

Hunting I/O Bottlenecks

Posted on 2011-09-20
22
895 Views
Last Modified: 2013-12-16
hello there,
how can I check I/O Bottlenecks on my centos v5.6, I think I am having high server load due to HD I/O.
0
Comment
Question by:XK8ER
  • 13
  • 9
22 Comments
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36569798
run "top", and see CPU ... xx%wa.
XX%wait means XX perecentage of IO wait.

run lsof  to see the open file process and pid and you can see the which process open file at this moment.
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36569889
You can check the URL for more details
http://www.performancewiki.com/diskio-tuning.html

Here is another one will perl script.
http://www.zarafa.com/wiki/index.php/Monitoring_Disk_IO_per_process
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36569909
on machine1 with high load this is what I get..

top - 16:36:44 up 13 days,  2:41,  1 user,  load average: 4.14, 4.84, 4.82
Tasks: 205 total,   1 running, 203 sleeping,   1 stopped,   0 zombie
Cpu(s): 76.8%us,  2.2%sy,  0.0%ni, 18.3%id,  2.3%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:   8309564k total,  6824200k used,  1485364k free,   377432k buffers
Swap:  8193128k total,   214040k used,  7979088k free,  3251892k cached

on machine2 with normal load this is what I get..
top - 12:31:54 up 17 days, 22:49,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 158 total,   1 running, 156 sleeping,   1 stopped,   0 zombie
Cpu(s):  0.2%us,  0.0%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3106040k total,  2864700k used,   241340k free,   292188k buffers
Swap:  5144568k total,      432k used,  5144136k free,  1783428k cached
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570085
> Cpu(s): 76.8%us,  2.2%sy,  0.0%ni, 18.3%id,  2.3%wa,  0.1%hi,  0.2%si,  0.0%st
2.3%wa  seems ok.
How about the output of
vmstat  5  5

and
sar  -d  | tail -5  

You need to have "sysstat" installed first for vmstat and sar.
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570099
this is what I get..

[(04:46 PM)][(root@alpha)] [(~)] $ vmstat  5  5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0 213568 1277244 387316 3455172    1    1    61    32    7    4 44  8 42  6  0
 3  0 213568 1252488 387324 3455812    0    0    51  2876 1533 2367 88  4  8  1  0
 3  0 213568 1250884 387328 3456236    0    0    61   368 1272 1745 84  2 13  1  0
 4  0 213568 1270580 387328 3456700    0    0    41  1146 1307 1664 46  2 50  2  0
 3  0 213568 1264152 387348 3457400    0    0    52  1607 1474 2244 55  3 37  5  0
[(05:02 PM)][(root@alpha)] [(~)] $ sar  -d  | tail -5
Requested activities not available in file
[(05:02 PM)][(root@alpha)] [(~)] $
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570118
The "wa" column in "vmstat 5 5" output show the 5% instantly IO wait.
It seems OK.

How about
lsof  | head -10
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570125
Make sure you have lsof installled. (yum install lsof)
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570139
[(05:09 PM)][(root@alpha)] [(~)] $ lsof  | head -10
COMMAND     PID      USER   FD      TYPE     DEVICE        SIZE       NODE NAME
init          1      root  cwd       DIR        9,1        4096          2 /
init          1      root  rtd       DIR        9,1        4096          2 /
init          1      root  txt       REG        9,1       38652   33456279 /sbin/init
init          1      root  mem       REG        9,1      129900   44892437 /lib/ld-2.5.so
init          1      root  mem       REG        9,1     1693812   44893011 /lib/libc-2.5.so
init          1      root  mem       REG        9,1       20668   44893047 /lib/libdl-2.5.so
init          1      root  mem       REG        9,1      245376   44893074 /lib/libsepol.so.1
init          1      root  mem       REG        9,1       93508   44893075 /lib/libselinux.so.1
init          1      root   10u     FIFO       0,17                   1303 /dev/initctl  
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570151
lsof |grep "9,1" | awk '{print $1" "$2}'| uniq -c | sort -nr
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570198
ok here
[(05:10 PM)][(root@alpha)] [(~)] $ lsof |grep "9,1" | awk '{print $1" "$2}'| uniq -c | sort -nr
    388 mysqld 22051
    160 httpd 4455
    160 httpd 4451
    160 httpd 4435
    160 httpd 31875
    159 httpd 4434
    159 httpd 4433
    159 httpd 3339
    159 httpd 2985
    159 httpd 2566
    159 httpd 2082
    159 httpd 1328
    159 httpd 1285
    159 httpd 1035
    158 httpd 6073
    158 httpd 6064
    158 httpd 6023
    158 httpd 5450
    158 httpd 5449
    158 httpd 5161
    158 httpd 5144
    158 httpd 4456
    158 httpd 4432
    158 httpd 4428
    158 httpd 4067
    158 httpd 3847
    158 httpd 3806
    158 httpd 2976
    158 httpd 27868
    158 httpd 27856
    158 httpd 1511
    158 httpd 1327
    125 python 6896
     91 yum-updat 4137
     79 php 6203
     77 php 6187
     57 searchd 5877
     53 spamd 6423
     53 spamd 6397
     53 spamd 3586
     49 eplwebdav 2378
     49 eplwebdav 2377
     49 eplwebdav 2376
     49 eplwebdav 2375
     49 eplwebdav 2374
     49 eplwebdav 2345
     48 sshd 6452
     48 sshd 6450
     48 sshd 6448
     48 sshd 6446
     48 sshd 6444
     48 sshd 31494
     48 eplhttpd 6916
     47 eplhttpd 6914
     45 python 3915
     41 MailScann 3293
     41 MailScann 29914
     41 MailScann 22868
     41 MailScann 22818
     41 MailScann 16626
     41 cupsd 3360
     38 sendmail 22152
     37 MailScann 3626
     34 sshd 6453
     34 sshd 6451
     34 sshd 6449
     34 sshd 6447
     34 sshd 6445
     34 sshd 3346
     32 sendmail 7632
     32 sendmail 3678
     32 sendmail 22162
     29 rpc.idmap 3027
     29 postmaste 3887
     29 postmaste 3884
     29 named 6644
     28 saslauthd 4027
     28 saslauthd 4026
     28 saslauthd 4025
     28 saslauthd 4024
     28 saslauthd 4023
     28 postmaste 3889
     28 postmaste 3888
     28 postmaste 3859
     28 crond 6180
     28 crond 1543
     24 bandwidth 3929
     19 automount 3227
     18 hald 4068
     16 clamd 3390
     15 syslogd 3249
     15 dbus-daem 3050
     15 crond 3943
     15 avahi-dae 4055
     15 avahi-dae 4054
     14 xinetd 3377
     14 exclog 27862
     14 atd 4006
     13 xfs 3981
     13 perl 3769
     13 hald-runn 4069
     12 auditd 2927
     11 smartd 4143
     11 gam_serve 4141
     11 bash 31496
     10 ulogd 3702
     10 udevd 576
     10 pcscd 3145
     10 mysqld_sa 21243
     10 lsof 6454
     10 hald-addo 4079
     10 hald-addo 4076
      9 lsof 6459
      9 iscsid 2192
      9 brcm_iscs 2182
      9 awk 6456
      8 update_vi 3755
      8 sh 6202
      8 run-parts 1548
      8 iscsid 2191
      8 irqbalanc 2960
      8 init 1
      8 hcid 3063
      8 grep 6455
      7 sort 6458
      7 sh 6185
      7 hidd 3183
      7 awk 3756
      7 acpid 3159
      6 uniq 6457
      6 sdpd 3069
      6 iostat 25242
      6 gpm 3766
      6 audispd 2929
      5 mingetty 4159
      5 mingetty 4152
      5 mingetty 4149
      5 mingetty 4148
      5 mingetty 4147
      5 mingetty 4146
      5 mdadm 2986
      5 klogd 3263
      2 watchdog/ 7
      2 watchdog/ 4
      2 watchdog/ 13
      2 watchdog/ 10
      2 scsi_eh_5 475
      2 scsi_eh_4 474
      2 scsi_eh_3 473
      2 scsi_eh_2 472
      2 scsi_eh_1 471
      2 scsi_eh_0 470
      2 rpciod/3 3020
      2 rpciod/2 3019
      2 rpciod/1 3018
      2 rpciod/0 3017
      2 rdma_cm 2162
      2 pdflush 22671
      2 pdflush 20738
      2 migration 8
      2 migration 5
      2 migration 2
      2 migration 11
      2 md1_raid1 514
      2 md0_raid1 517
      2 local_sa 2139
      2 kthread 19
      2 kswapd0 241
      2 kstriped 491
      2 ksoftirqd 9
      2 ksoftirqd 6
      2 ksoftirqd 3
      2 ksoftirqd 12
      2 kseriod 163
      2 krfcommd 3098
      2 kpsmoused 409
      2 kondemand 2013
      2 kondemand 2011
      2 kondemand 2010
      2 kondemand 2009
      2 kmpath_ha 1729
      2 kmpathd/3 1728
      2 kmpathd/2 1727
      2 kmpathd/1 1726
      2 kmpathd/0 1725
      2 kjournald 518
      2 kjournald 1760
      2 khungtask 238
      2 khubd 161
      2 khelper 18
      2 kedac 1113
      2 kblockd/3 28
      2 kblockd/2 27
      2 kblockd/1 26
      2 kblockd/0 25
      2 kauditd 543
      2 kacpid 29
      2 iw_cm_wq 2145
      2 iscsi_eh 2041
      2 ib_mcast 2137
      2 ib_inform 2138
      2 ib_cm/3 2155
      2 ib_cm/2 2154
      2 ib_cm/1 2153
      2 ib_cm/0 2152
      2 ib_addr 2120
      2 events/3 17
      2 events/2 16
      2 events/1 15
      2 events/0 14
      2 cqueue/3 158
      2 cqueue/2 157
      2 cqueue/1 156
      2 cqueue/0 155
      2 ata_aux 464
      2 ata/3 463
      2 ata/2 462
      2 ata/1 461
      2 ata/0 460
      2 aio/3 245
      2 aio/2 244
      2 aio/1 243
      2 aio/0 242
[(05:17 PM)][(root@alpha)] [(~)] $

Open in new window

0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570258
OK, the chance are
mysqld (DB server)
and
httpd (Apache)

 have a lot of open files and it might imply that those two processes have a lot of disk IO.

MySQL is the most culprit.

If you have a lot of MySQL queries, then you need to add more memory to improve the performance.

There are some tuning trick for MySQL (but add memory is the most effective)
1. Turn off MySQL query log if it is enable in /etc/my.cnf (log=....)
   Leave only error logging (log-error=... )
2. set the
innodb_buffer_pool_size=   (70% of your physical memory size, so the more memory you have, the more buffer you can set and it reduce the disk IO)
3. mount your filesystem with "noatime,nodiratime" in /etc/fstab

The change for 1 and 2 need to restart mysql. and the third one need to reboot the system.

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570267
For Apache, turn off the logging (Common/LogCustomLog) in httpd.conf.
Leave only error log.
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570328
I have the innodb_buffer_pool_size set to 2GB because the system is 32bit and it can only use about 3.5GB out of the 8GB ram installed..
should I install 64bit instead?
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570345
How big is your DB size?
du -sk  /var/lib/mysql/* | sort -nr

If you database is bigger than 4GB, then it is better to use 64bit OS.
Besides, 64bit OS allocate memory more efficiently than 32bit.
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570352
it was about 18GB before but I switched to innodb barracuda format using COMPRESSED and its 8.8GB total now
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570366
>  innodb barracuda format using COMPRESSED
Compressed is on filesystem, not on memory. Besides, the compression is an extra CPU consumption and it is not recommended to use COMPRESSION for your database.

18GB DB szie, it is time to move to 64bit OS with 32GB memory (memory price is cheap nowaday).
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570389
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570455
It is interesting article.
However, there are two conditions for the compression to work magically.
1. The only table customer has on this server is one huge innodb table with a set of TEXT fields.
   Does your innodb database like this?
2. All reads from this table were pretty random (buffer pool didn’t help).
  Does your condition like it?

Please also read the last post comment.

Anyway, the database tuning varies from DB usage and type. So if it really help on your db, then keep it that way.
However, for 8.5GB db size, it is still highly recommended to use 64bit OS with more memory.
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570569
is this normal on a 32bit OS?

[(06:27 PM)][(root@alpha)] [(~)] $ free -om
             total       used       free     shared    buffers     cached
Mem:          8114       7217        896          0        395       3638
Swap:         8001        208       7792

showing as using 7GB of ram?

also I sent a ticket to the datacenter regarding the OS reinstall with centos v5.6 and 64bit instead..
they said take a look at this http://www.centos.org/modules/newbb/viewtopic.php?topic_id=8457
would that work?
0
 
LVL 38

Expert Comment

by:wesly_chen
ID: 36570611
It is probably use PAE kernel. run "uname -a" to check
However, PAE has extra step of higher memory mapping which is not efficient for memory allocating.

For database server with size more than 8GB, it is highly recommended that using 64bit OS with more memory.

Check this benchmark for 32bit, 32bit PAE and 64bit OS on the same hardware.
64bit OS is out performance than 32bit in Apache and all other area.
http://www.phoronix.com/scan.php?page=article&item=ubuntu_32_pae&num=1
0
 
LVL 1

Author Comment

by:XK8ER
ID: 36570636
[(06:43 PM)][(root@alpha)] [(~)] $ uname -a
Linux alpha.site.net 2.6.18-238.19.1.el5PAE #1 SMP Fri Jul 15 08:15:44 EDT 2011 i686 i686 i386 GNU/Linux
[(06:43 PM)][(root@alpha)] [(~)] $

I dont think this server supports 32GB the max is 16GB if I remember..

but I dont want to spend time installing 16GB if im going to have the same issues?
0
 
LVL 38

Accepted Solution

by:
wesly_chen earned 500 total points
ID: 36570657
It depends on your database growth rate.
If your database size grows, say, a couple hundred MB per week, you will hit the memory issue half a year later.
If the growth rate is ok, a couple MB per week, then 16GB should be good enough for a year unless the condition change.

If you do not have too many tables, then use
innodb_file_per_table
  so you will not have uncontrolled innodb main tablespace growth which you can not reclaim.

Besides, three other tricks for tuning I post on  #36570258
 still good for most of condition.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now