I have atop running and enabled as a service, is it possible to establish which process/daemon/service has caused the server to misbehave,viz. server becomes un-responsivie and needs to be rebooted

I have atop  running and enabled as a service ...is it possible to establish which process/daemon/service  has caused the server to misbehave ....viz. server becomes un-responsivie
and needs to be restarted / rebooted ...?

in one shot is it possible to establish / conclude from atop  logs which/ what is causing the server to mis-behave ?

...this is relevant and does / is it possible to answer the below specific context from above setup ?

we have a few servers in the cloud (AWS) ...i find that the one or the other server has rebooted  ...many a time am not able to
find / gather what has caused a server reboot from the logs ...except for the fact that the server rebooted at a particular time.

I would like to configure my server in such a way that ...it would log the cause of  reboot ...next time it reboots.

would appreciate pointer's , hints or a detail procedure on how to achieve this. (i.e. the configuration to be done on server should help in achieving the RCA ...root cause analysis to be  determined  fairly  from the logs without missing any information.) ...viz. all the steps to be done so that vital information is not lost and is captured to find the facts.

also to clarify i don't notice any  core files captured on the systems.

as on date all the systems are production systems and they are not configured to capture a crash ( coredump )  or debugging info i believe ....kindly let  me know is it mandatory to enable capturing the crash / core  ...if so steps to do it as well
Venkat Ravi Shanker KconsultantAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dfkeCommented:
You can do some core dump analysis to see if it rebooted or really crashed using the crash utility

See RHEL website:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s1-kdump-crash

Cheers
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
arnoldCommented:
You need to configure the destination where the CoreDump will go as pointed out, and then look at ulimit to make sure it does not disable the core dump.

Commonly
/var/log/messages will include the reason for the reboot,
Common causes, memory related event, bad memory.
Processor related issue leading to a kernel panic.
The memory event could be a consequence of power supply issues where the memory voltage dips blelow threshold (more than 15% decline from nominal voltage)
0
dfkeCommented:
Hi,

this info will find the root cause of the issue.

Cheers.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.