How to read vmstat on Sun Solaris and what are the key items to test

On Sun Solaris I am running the below vmstat and need to know key items to look for in the output. I know I have serious problems and the system is about to crash when 'id' falls below 10. What other key values do I need to be testing?

----- vmstat output
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m0 m1 m4 m1   in   sy   cs us sy id
 0 0 0 22292328 18225184 51 2005 51 5 4 0 0 7  0  1  6 1680  623 1372 32 37 31
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

heres a short desription of each field:

r     in run queue
b     blocked for resources I/O, paging
w     swapped

swap  amount  of  swap   space   currently   available
free  size of the free list

re    page reclaims
mf    minor faults
pi    KB paged in
po   KB paged out
fr    KB freed
de  KB anticipated mem shortfall
sr  pages scanned by clocked alg.

m*  disk operations per second

in  interrupts/sec
sys  syscalls/sec
cs   constext switches/sec

us  %user time
sy   %system/kernel time
id   %idle time

idle time isnt very useful for diagnosing a crash.  However, if you're running something that eats up CPU and causing a crash, it might also be using up lots of memory as well... (even then, I dont know how a user application can cause the entire system to crash)
 Whatever the cause, it is not likely because of "id" dropping below 10%.

You'll need to give more info about what you're running to cause the crash
/var/adm/messages is the place to look at first when you encounter the crash.

rayskeltonAuthor Commented:
I am looking at this from a developer of the only application on numerous large Solaris systems, which eats much memory and cpu during peak production periods. This is actually a good problem to have, since it means business is good.  I can always count on serious outages to occure, when the id drops below 10 and have added this check into my monitoring software. I was wanting to look at other crutial items within vmstat. I am a developer and not a sys admin, so attempting to identify whatis  the exact problem at a system level is not my concern. My concern is to give a pre warning to production support before a problem actually occurs. This gives them time to shut down batch servers and potential prevent a problem.  
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Okay, then the "swap free" is another item you might want to watch.
Usually, when swap free go below certain percentage (3%) and the system start unstable.

Solaris crashes only if it runs out storage or system software problem. Even the CPU is running 95%, only the running processes are running slower than, it will not crash Solaris. I suspected the crash at your solaris is due to ran of virtual storage. Analyse the crash dump and you will find out the answer.

My installation has over 30 production Solaris system and I never have the Solaris crash due to high CPU utlization. We setup the monitoring tools to alert Technical Support whenever the CPU utilization of Solaris is over 90%.  Usually I issue top command to find out which process use most of the CPU. Kill the job if I suspect that the process is using extremely high CPU which slows down the system performance.

VMSTAT only shows the overall performance and it cannot find out the system hang up problem.  Our installation has over 50 production AIX and they never crash because of the CPU is high.  You need to install monitoring tools such as CA-NSM, BMC Patrol, Candle CCC or EcoTools to automate the computer monitoring.

Propsed System Health Checking
1. Run out of virtaul storage
   Check the usage of the swap file alert if it is over 80%
2. Filesystem corruption
   Monitor /var/adm/ras/message
3. Non-recovery hardware error such as CPU and memory
   Monitor  /var/adm/ras/messages to alert hardware message. You can get a list of hardware message from Solaris
4. Filesystem ran of space
  Monitor /var/adm/ras/messages if the usage of root, /tmp and /usr filesystem is over 90%


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial

   My personal experience with Sun Ultra 80/Enterprise E420R have the crash problem with high-loaded CPU.
It turns out to be the hardware architecture of the clock bus between CPU and memory has bug on this motherboard design.
No OS patch can really fix this issue (Solaris 7, 8, 9 are all have the same issue).

   Anyway, monitor the "swap usage" and the "/var partition" is important to avoid crashing or hung-up.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.