Several critical services suddenly stop with no warning or apparent cause
Posted on 2009-02-24
Several critical services, including, but not limited to, xinetd, ssh, cron, syslog, cups, iscsi and smb suddenly stop without warning. Users are kicked off the system and unable to log back in. Console login still works and these services can be restarted. This has happened several times so far, on our RHEL 5.3 and 5.2 Dell blade servers and our FC5 Dell Optiplex systems.
It has always happened during the day when users are logged in and working. None of my staff with root access report being active on the system immediately before this happens.
Because syslogd also stops, there are no entries in the logs to point to a cause.
I suspect that because it is happening on both RHEL 5 and FC 5 and on completely different hardware platforms, that it is not a distribution/kernel specific issue or a hardware issue but that it is some of our custom scripting that is causing it.
I would like to know what could cause the symptoms we are experiencing. Knowing what could cause it, may help me find the code that is doing it.
I am also wondering if there is some way to keep logging running while this happens. Perhaps a different log daemon?