How to do Health check for AIX servers ? what information need to be collected ?

We have 400 hundred servers in our Implementation Project. We need to do health check For this servers ? What information I need to check and collect for AIX servers ?
Do we have any standard tools available to do that ? How to prepare report for it ?
LVL 2
rammaghentharAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

woolmilkporcCommented:
Hi again,

there is no standard healthcheck script in AIX.

In one of our other cases we're talking about 'cfg2html' which is a fine tool to get
an overview of the vital data of your machines.

Furthermore, health checking cannot be done by sort of a 'snapshot', but should be a continuous process,
using a monitoring tool like e.g. nagios:

http://www.nagios.org/

Anyway, to check the most important things you could run a little script
regularly against all machines contained in a server list for:

prtconf -> overview
errpt -> hardware error log
df -> filesystems
diag -cs -> hardware diagnostics
lppchk -v -> software packages' consistency

It could look like this (see attachment):

Note that you should have ssh access using publickey, in order to not get prompted for passwords.

And since you're talking about 400 servers, it seems nearly impossible to read all the output from
any check script, so I'd really suggest using a monitoring tool (see nagios above)!

wmp

#!/bin/ksh
serverlist=[/path/to/]server-list
for host in $(cat $serverlist)
 do
   /usr/bin/ssh $host '
   echo RUNNING PRTCONF
   echo
   /usr/sbin/prtconf
   echo
   echo RUNNING errpt
   echo
   /usr/bin/errpt
   echo
   echo RUNNING df -g
   echo
   /usr/bin/df -g
   echo
   echo RUNNING diag -cs
   echo
   /usr/sbin/diag -cs
   echo
   echo RUNNING lppchk -v
   echo
   /usr/bin/lppchk -v ' > [/path/to/]$host.$(date +"%Y.%m.%d").custom.check
  done
exit

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gheistCommented:
400000 AIX servers? Are you IBM?
diag has automated diagnostics facility whose config is stored in ODM
0
Kerem ERSOYPresidentCommented:
Depends on what you understand from health chek. If you're after monitoring CPU load, Disk capacity etc. You need a systematic approach.  In this case you need central periodical monitoring and alerting. This could be done with with monitoring tools such as IBM's Tivoli or Nagios.
0
gheistCommented:
diag contains part about chcheduled RAM/CPU/DISK/RAID/sysplanar0 diagnostics.
it serves practical and formal policy porposes quite well.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.