I have a development server that has been exhibiting odd behavior for a number of month. At varying intervals, the server unexpectedly powers off, and requires a manual restart.
I believe I have ruled out hardware, because I have replaced almost everything, including motherboard, CPU, RAM, power supply, and case. The only hardware that has been present during all of the power-downs are the hard drives (4 SATA drives of varying sizes and manufacturers).
The server receives power from an APC UPS. One other server is also attached to this UPS, and does not exhibit the same behavior. Just to be sure, I replaced the UPS with a new one, but the problem persisted.
I have the "Reboot on power loss" option enabled in the BIOS, and have verified it works when I manually remove power from the server. For example, if I pull the power cord from the back of the server and reattach it, the box powers up immediately.
All of this leads me to think that this must be a software, rather than an hardware issue. Here are the things I have checked:
1. Logs. There are no indications in any of the logs regarding an intentional power down. When examining the logs around the time of the power down, there are normal log entries until the time of the event, immediately followed by the reboot log entries.
2. Users. Only one other person besides myself has logon access to this box, and the server resides in a locked room to which only he and I possess keys. Neither he nor I are the direct cause of the power offs.
3. Remote access. The box sits behind a hardware firewall that allows traffic only on web (80 and 443), ftp (21), and ssh (22) ports. Traffic to all other ports is denied at the firewall level.
4. Software. The server runs a standard installation of FreeBSD 6.2-RELEASE, along with Apache 1.3.37 and PHP 5.2.3. It is a development server known only to myself, my associate mentioned above, and my client.
As mentioned above, the power downs occur at random intervals. The most recent occurred on a Tuesday at 10:33pm. Others have been on a Saturday morning, or during weekday business hours. Sometimes the power off occurs when no one is actively using the box, other times when I am logged in and performing normal operations. At times the server powers off three times in a week, at other times it remains up for as long as eleven days.
I am well-versed in Unix server administration, so I believe I am not overlooking anything obvious in the logs. I have a history in hardware troubleshooting and repair, and have thoroughly examined that aspect of box. I am at wits end to discover the cause of this issue, and would greatly appreciate any assistance or suggestions that the EE community can provide.