Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How to know which process was using system resources when it was found hung ?

Posted on 2014-01-23
10
Medium Priority
?
444 Views
Last Modified: 2014-02-04
Hi Experts,

Many times i am experiencing an issue where i found many systems in hung state and at that time i was unable to execute any command to know what exactly is keeping the resources busy..

Can someone pls help me in understanding it more as how can we get to know about what all processes were held responsible for making the system busy ?

Any help will be highly appreciated.

Thanks,
SA
0
Comment
Question by:Sandy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 8

Expert Comment

by:Surrano
ID: 39802467
If this is a reoccurring problem then best would be to save some sort of information every N minutes. I'd include:
- top
- iotop
- iftop
- lsof
So write a cron job that saves this e.g. every 10 minutes and keeps the result of last 30 runs. After a hang, you'll have 5 hours to log in and find out who and why.

Another possibility would be quota on CPU usage and stuff but that'd need at least some hints on what was going on and even then my expertise is shorter than that...
0
 
LVL 13

Author Comment

by:Sandy
ID: 39802479
Thanks Surrano.. This can be done if issue has re-occurrence but i am more concern on what had happened in past with the system :(

TY/SA
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39802631
Well then dig your system for files that have been modified during the hang... or anytime later.
- Log files e.g. syslog, wtmpx, etc.
- Core files or other crash dump-like stuff
The pure identity of these files, or the contents within, may give you some hints.

If collection of system activity records (sar) was turned on then you can check things like amount of device I/O, semaphores, etc. E.g. if you find that there was a peak on one of the logical volumes that contains a database index file for table X you should check the use cases that access that particular table / index. If that device contains only Apache logs then you should check your webserver. Etc.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 13

Author Comment

by:Sandy
ID: 39802641
definitely i can search for application specific logs for that but just to be sure as which exactly was the process caused this because many times we as SysAdmin are not allowed to look into app logs and sometimes those are written in such language which is not understandable to SysAdmin.

TY/SA
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39802785
I never thought about apps, more like OS / OEM level. If you find something there it would give you the cutting edge to explain why you need to look into app logs-- or ask app support people to have a look based on strong suspicion that it caused the hang and let them prove that it did not.
0
 
LVL 13

Author Comment

by:Sandy
ID: 39831632
Any further possible way ??
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39831770
Can you inspect the quality of water in a flow in a retrospective way? Currently it has no cyanide, but did it have cyanide one hour / day / year ago? Not unless cyanide left some telltale sign. It's up to you to find those signs, and, especially, up to you to develop proactive measures like generating those signs automatically if the problem occurs.
0
 
LVL 13

Author Comment

by:Sandy
ID: 39831800
Instead of as you mentioned "you" i believe in "us"...

I am trying to figure out the preventive way from those signs...,  but seeking (your) experts advices if we can make it better. Hope not bothering you :)

TY/SA
0
 
LVL 8

Accepted Solution

by:
Surrano earned 2000 total points
ID: 39831856
I didn't mean personally, I meant the generic subject "you" in English "man" in German... So *we* are looking for a needle in a haystack. This is similar to my job I do for a living as an employee but usually information is more than simply "something crashed". Even then, I'm so familiar with *our* systems that I know all the OS/OEM/APP layers inside-out.

As a next step I'd ask for vast symptoms like:
- /var/log/messages and syslog
- /var/adm/sa/sa* files
- Webserver logs
- Database logs
- Application logs
- lsof output
- iostat output (with various flags)
And if it still doesn't help, then login data and other descriptive information about the time and duration of the hang. If you can provide these within the limitations of EE (which I doubt) then I could help in the analysis. Otherwise, all you can do is to capture all these symptoms e.g. every hour until next crash. Always keep last 24 hours at least and then you'll have a possibility to compare.

As a rule of thumb, performance is relative. A hang can be assessed only if you can compare the performance data collected during/immediately before the hang to a baseline collected during normal operation.
0
 
LVL 13

Author Closing Comment

by:Sandy
ID: 39831897
Thanks for your help Surrano...  Vielen Dank
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question