Hello,
I have been trying to troubleshoot a problem for few weeks or so. I have 2 HP DL360 G6 servers running CentOS 5.3 64bit, with heartbeat and drbd running for a fail safe. Heartbeat and DRBD are replicating a 115GB parition where MySQL database files are stored.
Lately I have found that the machines have been rebooting by them selfs. Now i never added a reboot cron job, but i figured there might have been one installed by default. Never could find one.
When I run the last command. I can see where the machine is rebooting.
reboot system boot 2.6.18-128.el5xe Sat Aug 8 21:49 (2+17:48)
Then when I look at the messages log for that day, there is nothing prior to the that time for a few hours. It's as though the machine crashed and auto rebooted.
For both machines (which are doing the same thing just at different times) I did the following:
- Ran Mem tests on all 3x4GB sticks of RAM (thats 12GB total). The tests did not throw out a error.
- Ran CPU stress test to see if it would hickup. It didn't.
- Ran a HardDrive diagnostic on the 2 drives in the machine. No reported errors or bad sectors
- Looked for a Crash dump couldn't find one.
Honestly I am still pretty new to Linux so I may have looked in the wrong place for a crash dump, or crash dumps capabilities may not even be installed on the machines.
If any of you have any ideas please let me know. I would like to get this problem resolved.
Thanks in advance.