Solved

vmstat procs blocked - how to dig deeper?

Posted on 2008-10-07
7
2,193 Views
Last Modified: 2013-12-06
Hello,

System details:
HP-UX 11.23 on ia64

There appears to be a resource bottleneck on a server. When I run vmstat, I get the following output:
vmstat 5 5                                                                                          
                                                                         
         procs           memory                   page                              faults       cpu    
    r     b     w      avm    free   re   at    pi   po    fr   de    sr     in     sy    cs  us sy id  
    5    19     0  5503761  33861046  308   80     2    0     0    0     2  38750 416246 17021  16  7 77
    6    19     0  4093880  33860506  229   55     0    0     0    0     0  27147 228747 11679  17  4 79
    6    19     0  4093880  33859994  255   73     0    0     0    0     0  23451 218478 10536  17  4 79
    5    20     0  4137557  33859938  137   35     0    0     0    0     0  22023 202479  9512  18  3 80
    5    20     0  4137557  33860349  168   60     0    0     0    0     0  22964 563017  9528  16  6 78

From what I understand, b = blocked, which means the process is awaiting resources. As the output suggests, this is not memory related, so it must be I/O (disk operations or network or network, right?)

The database response times are down. How can I dig deeper into this? I've taken a look at iostat but the values don't really tell me much.

Thanks in advance.
0
Comment
Question by:SAP11-11
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 6

Expert Comment

by:peter991
ID: 22658360
Here is som notes I made/found when looking in to vmstat.

 Problem symptoms:
1.) If the number of processes in run queue (procs r) are consistently greater than the number of CPUs on the system it will slow down system as there are more processes then available CPUs .
2.) if  this number is more than four times the number of available CPUs in the system then system is facing shortage of cpu power and will greatly slow down the processess on the system.
3.) If  the idle time (cpu id) is consistently 0 and if the system time (cpu sy) is double the user time (cpu us)  system is facing shortage of CPU resources.
     
Resolution :
Resolution to these kind of issues involves tuning of application procedures  to make efficient use of cpu
and as a last resort increasing the cpu power or adding more cpu to the system.  
0
 

Author Comment

by:SAP11-11
ID: 22658398
Thanks for the reply but I understand the 'r' column. I don't think the server is suffering a CPU shortage.
It's the blocked processes that concern me. I've read that this should not very often go over 1 and indicates that the processes must await another resource before completion.
0
 
LVL 6

Expert Comment

by:peter991
ID: 22658469
Hi!
It's hard to tell but your pi,po (paging) and sc (scan-rate) is zero.
(I saw the single 2 on the first line)

My guess is to focus on the application you are running on your machine.
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 

Author Comment

by:SAP11-11
ID: 22658813
The application is Oracle and the 'log file sync' times are higher than expected (not a great deal, however.)

My question is; is it possible to drill down at OS level to ascertain what could be causing the processes to be blocked to such an extent. My thinking is disk I/O, especially considering the log file sync times being up. However, the stats from vmstat look extrodanarily high. I've never seen this many blocked processes before.

This is the most recent output:
         procs           memory                   page                              faults       cpu    
    r     b     w      avm    free   re   at    pi   po    fr   de    sr     in     sy    cs  us sy id  
    3    23     0  5410152  33450719  308   80     2    0     0    0     2  38796 416588 17049  16  7 77
    7    16     0  5386809  33450385  109   28     0    0     0    0     0  38351 858951 15928  12  7 81
    7    16     0  5386809  33449929  287   77     0    0     0    0     0  38375 765130 15618  13  7 80
    8    19     0  5035875  33450725  331   73     0    0     0    0     0  40205 473336 17055  15  7 78
    8    19     0  5035875  33450569  207   51     0    0     0    0     0  36935 345422 15414  16  5 79
    6    22     0  6256459  33450568  120   29     0    0     0    0     0  36713 292311 14967  17  4 79
    6    22     0  6256459  33450438   52   12     0    0     0    0     0  34592 259139 13439  18  4 78
   12    14     0  6079667  33450421  245  199     0    0     0    0     0  32498 237380 12549  23  5 73
   12    14     0  6079667  33450421   80   64     0    0     0    0     0  34561 231179 13518  23  3 74
    6    19     0  5188123  33450588  103   47     0    0     0    0     0  36349 256779 14371  28  4 67
    6    19     0  5188123  33450469   39   15     0    0     0    0     0  35150 267327 14136  28  4 68
   16    11     0  5326562  33450452   78   30     0    0     0    0     0  40089 386057 19325  20  5 75

The values are consistently high.
0
 
LVL 6

Expert Comment

by:peter991
ID: 22658889
Perhaps this is a Oracle-question.
have you looked over your database?
Doe's it switch a lot?
Pending on your Oracle version, doe's the values from AWR or Statspack look good?
0
 

Accepted Solution

by:
SAP11-11 earned 0 total points
ID: 22718582
We checked with the DBAs and they say that Oracle isn't the problem and the slightly increased high sync times indicate an I/O bottleneck, something that is outside of Oracle's control (assuming all files are spread across the devices optimally.)

sar -d gave more info and I was able to use this command to get a better view of what the devices were doing. This gave me the info I needed to push the problem back to the storage experts.

Thanks for all your help, anyway.
0
 
LVL 6

Expert Comment

by:peter991
ID: 22718699
I'm glad to be at help.

Good luck!
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
Introduction Regular patching is part of a system administrator's tasks. However, many patches require that the system be in single-user mode before they can be installed. A cluster patch in particular can take quite a while to apply if the machine…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question