• Status: Solved
  • Priority: Low
  • Security: Public
  • Views: 69
  • Last Modified:

I have atop running and enabled as a service, is it possible to establish which process/daemon/service has caused the server to misbehave,viz. server becomes un-responsivie and needs to be rebooted

I have atop  running and enabled as a service ...is it possible to establish which process/daemon/service  has caused the server to misbehave ....viz. server becomes un-responsivie
and needs to be restarted / rebooted ...?

in one shot is it possible to establish / conclude from atop  logs which/ what is causing the server to mis-behave ?

...this is relevant and does / is it possible to answer the below specific context from above setup ?

we have a few servers in the cloud (AWS) ...i find that the one or the other server has rebooted  ...many a time am not able to
find / gather what has caused a server reboot from the logs ...except for the fact that the server rebooted at a particular time.

I would like to configure my server in such a way that ...it would log the cause of  reboot ...next time it reboots.

would appreciate pointer's , hints or a detail procedure on how to achieve this. (i.e. the configuration to be done on server should help in achieving the RCA ...root cause analysis to be  determined  fairly  from the logs without missing any information.) ...viz. all the steps to be done so that vital information is not lost and is captured to find the facts.

also to clarify i don't notice any  core files captured on the systems.

as on date all the systems are production systems and they are not configured to capture a crash ( coredump )  or debugging info i believe ....kindly let  me know is it mandatory to enable capturing the crash / core  ...if so steps to do it as well
0
Venkat Ravi Shanker K
Asked:
Venkat Ravi Shanker K
  • 2
2 Solutions
 
dfkeCommented:
You can do some core dump analysis to see if it rebooted or really crashed using the crash utility

See RHEL website:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s1-kdump-crash

Cheers
0
 
arnoldCommented:
You need to configure the destination where the CoreDump will go as pointed out, and then look at ulimit to make sure it does not disable the core dump.

Commonly
/var/log/messages will include the reason for the reboot,
Common causes, memory related event, bad memory.
Processor related issue leading to a kernel panic.
The memory event could be a consequence of power supply issues where the memory voltage dips blelow threshold (more than 15% decline from nominal voltage)
0
 
dfkeCommented:
Hi,

this info will find the root cause of the issue.

Cheers.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now