I have atop running and enabled as a service ...is it possible to establish which process/daemon/service has caused the server to misbehave ....viz. server becomes un-responsivie
and needs to be restarted / rebooted ...?
in one shot is it possible to establish / conclude from atop logs which/ what is causing the server to mis-behave ?
...this is relevant and does / is it possible to answer the below specific context from above setup ?
we have a few servers in the cloud (AWS) ...i find that the one or the other server has rebooted ...many a time am not able to
find / gather what has caused a server reboot from the logs ...except for the fact that the server rebooted at a particular time.
I would like to configure my server in such a way that ...it would log the cause of reboot ...next time it reboots.
would appreciate pointer's , hints or a detail procedure on how to achieve this. (i.e. the configuration to be done on server should help in achieving the RCA ...root cause analysis to be determined fairly from the logs without missing any information.) ...viz. all the steps to be done so that vital information is not lost and is captured to find the facts.
also to clarify i don't notice any core files captured on the systems.
as on date all the systems are production systems and they are not configured to capture a crash ( coredump ) or debugging info i believe ....kindly let me know is it mandatory to enable capturing the crash / core ...if so steps to do it as well