Solved

How to know which process was using system resources when it was found hung ?

Posted on 2014-01-23
10
441 Views
Last Modified: 2014-02-04
Hi Experts,

Many times i am experiencing an issue where i found many systems in hung state and at that time i was unable to execute any command to know what exactly is keeping the resources busy..

Can someone pls help me in understanding it more as how can we get to know about what all processes were held responsible for making the system busy ?

Any help will be highly appreciated.

Thanks,
SA
0
Comment
Question by:Sandy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 8

Expert Comment

by:Surrano
ID: 39802467
If this is a reoccurring problem then best would be to save some sort of information every N minutes. I'd include:
- top
- iotop
- iftop
- lsof
So write a cron job that saves this e.g. every 10 minutes and keeps the result of last 30 runs. After a hang, you'll have 5 hours to log in and find out who and why.

Another possibility would be quota on CPU usage and stuff but that'd need at least some hints on what was going on and even then my expertise is shorter than that...
0
 
LVL 13

Author Comment

by:Sandy
ID: 39802479
Thanks Surrano.. This can be done if issue has re-occurrence but i am more concern on what had happened in past with the system :(

TY/SA
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39802631
Well then dig your system for files that have been modified during the hang... or anytime later.
- Log files e.g. syslog, wtmpx, etc.
- Core files or other crash dump-like stuff
The pure identity of these files, or the contents within, may give you some hints.

If collection of system activity records (sar) was turned on then you can check things like amount of device I/O, semaphores, etc. E.g. if you find that there was a peak on one of the logical volumes that contains a database index file for table X you should check the use cases that access that particular table / index. If that device contains only Apache logs then you should check your webserver. Etc.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 13

Author Comment

by:Sandy
ID: 39802641
definitely i can search for application specific logs for that but just to be sure as which exactly was the process caused this because many times we as SysAdmin are not allowed to look into app logs and sometimes those are written in such language which is not understandable to SysAdmin.

TY/SA
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39802785
I never thought about apps, more like OS / OEM level. If you find something there it would give you the cutting edge to explain why you need to look into app logs-- or ask app support people to have a look based on strong suspicion that it caused the hang and let them prove that it did not.
0
 
LVL 13

Author Comment

by:Sandy
ID: 39831632
Any further possible way ??
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39831770
Can you inspect the quality of water in a flow in a retrospective way? Currently it has no cyanide, but did it have cyanide one hour / day / year ago? Not unless cyanide left some telltale sign. It's up to you to find those signs, and, especially, up to you to develop proactive measures like generating those signs automatically if the problem occurs.
0
 
LVL 13

Author Comment

by:Sandy
ID: 39831800
Instead of as you mentioned "you" i believe in "us"...

I am trying to figure out the preventive way from those signs...,  but seeking (your) experts advices if we can make it better. Hope not bothering you :)

TY/SA
0
 
LVL 8

Accepted Solution

by:
Surrano earned 500 total points
ID: 39831856
I didn't mean personally, I meant the generic subject "you" in English "man" in German... So *we* are looking for a needle in a haystack. This is similar to my job I do for a living as an employee but usually information is more than simply "something crashed". Even then, I'm so familiar with *our* systems that I know all the OS/OEM/APP layers inside-out.

As a next step I'd ask for vast symptoms like:
- /var/log/messages and syslog
- /var/adm/sa/sa* files
- Webserver logs
- Database logs
- Application logs
- lsof output
- iostat output (with various flags)
And if it still doesn't help, then login data and other descriptive information about the time and duration of the hang. If you can provide these within the limitations of EE (which I doubt) then I could help in the analysis. Otherwise, all you can do is to capture all these symptoms e.g. every hour until next crash. Always keep last 24 hours at least and then you'll have a possibility to compare.

As a rule of thumb, performance is relative. A hang can be assessed only if you can compare the performance data collected during/immediately before the hang to a baseline collected during normal operation.
0
 
LVL 13

Author Closing Comment

by:Sandy
ID: 39831897
Thanks for your help Surrano...  Vielen Dank
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
RHEL7 - Error installing docker compose 2 209
Intel fortran compiler (ifort) 5 81
Post Clonezilla image restore issue 6 85
Disabling security updates Ubuntu 3 64
Why Shell Scripting? Shell scripting is a powerful method of accessing UNIX systems and it is very flexible. Shell scripts are required when we want to execute a sequence of commands in Unix flavored operating systems. “Shell” is the command line i…
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question