Solved

How to know which process was using system resources when it was found hung ?

Posted on 2014-01-23
10
432 Views
Last Modified: 2014-02-04
Hi Experts,

Many times i am experiencing an issue where i found many systems in hung state and at that time i was unable to execute any command to know what exactly is keeping the resources busy..

Can someone pls help me in understanding it more as how can we get to know about what all processes were held responsible for making the system busy ?

Any help will be highly appreciated.

Thanks,
SA
0
Comment
Question by:Sandy
  • 5
  • 5
10 Comments
 
LVL 8

Expert Comment

by:Surrano
Comment Utility
If this is a reoccurring problem then best would be to save some sort of information every N minutes. I'd include:
- top
- iotop
- iftop
- lsof
So write a cron job that saves this e.g. every 10 minutes and keeps the result of last 30 runs. After a hang, you'll have 5 hours to log in and find out who and why.

Another possibility would be quota on CPU usage and stuff but that'd need at least some hints on what was going on and even then my expertise is shorter than that...
0
 
LVL 13

Author Comment

by:Sandy
Comment Utility
Thanks Surrano.. This can be done if issue has re-occurrence but i am more concern on what had happened in past with the system :(

TY/SA
0
 
LVL 8

Expert Comment

by:Surrano
Comment Utility
Well then dig your system for files that have been modified during the hang... or anytime later.
- Log files e.g. syslog, wtmpx, etc.
- Core files or other crash dump-like stuff
The pure identity of these files, or the contents within, may give you some hints.

If collection of system activity records (sar) was turned on then you can check things like amount of device I/O, semaphores, etc. E.g. if you find that there was a peak on one of the logical volumes that contains a database index file for table X you should check the use cases that access that particular table / index. If that device contains only Apache logs then you should check your webserver. Etc.
0
 
LVL 13

Author Comment

by:Sandy
Comment Utility
definitely i can search for application specific logs for that but just to be sure as which exactly was the process caused this because many times we as SysAdmin are not allowed to look into app logs and sometimes those are written in such language which is not understandable to SysAdmin.

TY/SA
0
 
LVL 8

Expert Comment

by:Surrano
Comment Utility
I never thought about apps, more like OS / OEM level. If you find something there it would give you the cutting edge to explain why you need to look into app logs-- or ask app support people to have a look based on strong suspicion that it caused the hang and let them prove that it did not.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 13

Author Comment

by:Sandy
Comment Utility
Any further possible way ??
0
 
LVL 8

Expert Comment

by:Surrano
Comment Utility
Can you inspect the quality of water in a flow in a retrospective way? Currently it has no cyanide, but did it have cyanide one hour / day / year ago? Not unless cyanide left some telltale sign. It's up to you to find those signs, and, especially, up to you to develop proactive measures like generating those signs automatically if the problem occurs.
0
 
LVL 13

Author Comment

by:Sandy
Comment Utility
Instead of as you mentioned "you" i believe in "us"...

I am trying to figure out the preventive way from those signs...,  but seeking (your) experts advices if we can make it better. Hope not bothering you :)

TY/SA
0
 
LVL 8

Accepted Solution

by:
Surrano earned 500 total points
Comment Utility
I didn't mean personally, I meant the generic subject "you" in English "man" in German... So *we* are looking for a needle in a haystack. This is similar to my job I do for a living as an employee but usually information is more than simply "something crashed". Even then, I'm so familiar with *our* systems that I know all the OS/OEM/APP layers inside-out.

As a next step I'd ask for vast symptoms like:
- /var/log/messages and syslog
- /var/adm/sa/sa* files
- Webserver logs
- Database logs
- Application logs
- lsof output
- iostat output (with various flags)
And if it still doesn't help, then login data and other descriptive information about the time and duration of the hang. If you can provide these within the limitations of EE (which I doubt) then I could help in the analysis. Otherwise, all you can do is to capture all these symptoms e.g. every hour until next crash. Always keep last 24 hours at least and then you'll have a possibility to compare.

As a rule of thumb, performance is relative. A hang can be assessed only if you can compare the performance data collected during/immediately before the hang to a baseline collected during normal operation.
0
 
LVL 13

Author Closing Comment

by:Sandy
Comment Utility
Thanks for your help Surrano...  Vielen Dank
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
SIP Trunk provider 20 91
linux crontab output 3 36
My bash alias isn't executing 5 31
Best way to split and output to csv in bash 2 26
Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now