I deployed a brand new HP Proliant ML350 G5 in March of this year. Its running Server 2003 SBS standard, and using Symantec backup exec 12 for a backup system (external hard drives as backup media). The server ran like a champ for a couple months, and then on May 5th, at night, the server locked up, and my customer took it down with the power button. It ran fine for a few days after that and then happened again. This time, I went onsite to see exactly what was happening. The server appeared to be in hard lock (num lock light would not toggle, and the monitor was just black. I have 4 1-TB drives in RAID 5, and the activity LED's were going crazy. I went to a workstation and could ping the server successfully. I could also telnet to port 21 and it asked me for ftp credentials, but wouldn't authenticate me when I gave them to it. So I had to power it down with the power button again. Since then it's locked up several more times
Here are the times and dates that it appears to have gone into lock (as indicated by the eventlog. When i restart the server it logs a 6008 (previous system shutdown at blah blah was unexpected...) The event date shows the time that the server came back up after I shut it down, but inside the event it shows the time at night when i presume it locked up):
5-24 - 2:33 am
5-19 - 2:02 am
5-14 - 11:44 pm
5-12 - 11:30 pm
5-5 - 2:19 am
Other than the shutdown event, there is nothing in any of the event logs that is consistent -- nothing that would indicate a particular application or service or scheduled task causing the issue.
The only thing that runs at night is the backup. I run two backup jobs each night, splitting the vast amount of data between two external 1-TB hard drives. One job does a bunch of data plus system, system state, and exchange. The other just does like 700G of data (mostly photoshop and illustrator drawings). I run a full backup on Fridays, and differential backups Mon-Thurs. As you can see by the dates, I'm experiencing the lockup both on differential nights and on full nights. The first night the server locked up, both backups completed successfully (differential) before the server locked up. Other times one completes and the other might or might not complete. The lockup times do not directly relate to the start or finish of any backup job.
No updates were installed, and no application installations or changes had been made at any point.
I have Microsoft involved on this, and they installed perfwiz and poolmon to collect data at the time of lockup, but upon analysis of the logs, they're not coming up with much right now.
Any ideas? This one is driving me crazy.