Link to home
Start Free TrialLog in
Avatar of way_ching
way_ching

asked on

Auto-detection and removal of dead process on AIX

Hi all ^_^
Okay, here's the problem:

There's an IBM RS6000 server running AIX 4.3 with Informix database in my workplace.  During peak hours, it has about 350 users doing mostly database operations.  It has been set up that those users can only log-in the system at one terminal at a time.  Due to many reasons (improper disconnection from the system, disrupted connection, etc), some users' processes are hung in the system, eating up system resources, even though they got disconnected.  

Thus, our system operators receive a lot of user requests to kill their processes.  A shell script is used to kill all processes without a tty(terminal name) belonging to a specified user.  Apparantly,
our operators are sick of getting phone calls to kill those dead processes.  

So, I want to know if there's a way to have the system automatically kill those garbage
processes.  And hopefully it can do it before our users notice it so that we won't have
endless phone calls to answer during peak hours.

Thanks..........    i wonder how many ppl would finish reading this LONG question..  :P

Avatar of yuzh
yuzh

Hi way_ching,

    put your script into the cron task (root's crontab) on your server, and let it run for every 10 min or 15/20/min, let cron to do the job for you. if you need more details please let me know.

    Cheers!

yuzh.
Hello, way ching!
This problem sounds familiar.  I have made something like this myself.  My script is very long, but I can give you a brief overview of how it works.

1) First, decide which usergroups should be subject to this control.
2) If neccessary, make a list of users who should NOT be controlled, even if they belong to the groups in 1.
3) This information may be stored in a configuration file, and the script may be designed to read from this file.
4) Read the group numbers from /etc/group (or by ypcat group if you use NIS), make a list of group numbers.
5) Read the usernames that belongs to these groupnumbers from /etc/passwd (or ypcat passwd). I use awk to compare and excract the right fields.
6) Now, you have a list with usernames, exclude the protected usernames from 2 (or do it at the same time as you make the list).
7) Make a loop for every username, list the users processes with ps -fu <username>.  Based on the printout from ps, make som method (I use awk) to choose the processes that should be killed, and print out the process numbers to a list
8) Now you have a list that contains all the PID's of the processes you want to kill, if you store this list into a variable, kill them all with one command: kill $pidlist.
9) Some processes may be hard to kill, so I wait for 30 seconds and run the kill again, with -9 option: kill -9 $pidlist.

The script should, as yuzh says, be run by cron at a frequency suitable for your needs.  IMPORTANT: such a script must be run by user 'root', since it kills other users' processes.
I have been using a script like this for many moths now, and it works just fine!

Good Luck!
Sounds like your application needs to be able to handle HUP signals that tell it the user terminal session has disconnected.
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation for this question in the Cleanup topic area as follows:
- PAQ & remove points

Please leave any comments here within the next 7 days

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER !

tfewster (I don't work here, I'm just an Expert :-)
ASKER CERTIFIED SOLUTION
Avatar of SpideyMod
SpideyMod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial