I have 2 identical boxes that we built to initially use as a sandbox, but we recently moved one into production use (about 2 months ago) and users were complaining that the server was constantly offline. Because I had two identicle boxes I was fortunate to have one to test solutions on before implementing those solutions onto the production machine.
What is happening is the system would randomly hang. Not crash or BSOD, just lock up completely.
This is the build:
MB - ASUS Z9PE-D8 WS
BIOS Updated on 1 from 3206 to 5109
Configured with 2 seperate RAID Groups.
Group0 - RAID 1 - OS Installed here
Group1 - RAID 5 - VM's stored here
(RAID 5 shows as "Intel Raid 5 Volume SCSI Disk Device)
CPU: 2x Intel Xeon E5-2609 Sandy Bridge-EP, 2.4GHz 80w Quad-Core, BX80621E52609
RAM: 8x 4GB - Kingston 240-pin DDR3 SDRAM ECC Registered DDR 1333 1.35V VLP KVR13LR9D8L/4HC
NIC: 2x onboard, 1x Intel Gigabit CT Desktop Adapter
HDD: 5x HGST Travelstar H2IK5001672SP (0S02858) 500GB 7200 RPM 32MB Cache SATA 6.0Gb/s 2.5" Internal
PSU: CORSAIR Professional Series Gold AX1200 (CMPSU-1200AX) 1200W ATX12V v2.31 / EPS12V v2.92
Graphics: EVGA e-GeForce 8400 GS (Nvidia 8400) PCI-E 2.0 x16
Drive Cady: Thermaltake RC1600101A MAX-1562 5.25" (x1) Bay to 2.5" (x6) Bay Mobile Rack HDD Canister
O/S: Windows Server 2008 R2, SP1 (Build 7601)
The BIOS version initially on the test machine was 3206, but I updated it to 5109. The system stopped hanging, but now keeps crashing. I also enabled WHEA in the BIOS, so now other than being told by "WhoCrashed" that the failing module was "hal.dll", it actually shows that there is a fatal memory issue.
I ran MemTest86 and it kept locking up at around 15% or so. Some research showed that setting the voltage from automatic to the recommended settings by the RAM Manufacturer could resolve this, and it did. MemTest86 passed with flying colors...but the system will still BSOD with the same error.
I contacted ASUS, and the support tech said to flash the bios to version 3302 (currently running 5109), as that is the highest version I need for my processor.
The issue I'm having and they seem unwilling to assist is None of the tools provided will update the BIOS. The EZ Flash utility refuse to use the file as it is older than the currently installed version, same with the windows utility, and the BUPDATER.exe won't work because it is a CAP file and not a ROM file.
In the mean time I still have to bounce the production server at least once a day when it hangs up...with no resolution in sight. Again, when the server hangs it gives no error message at all. I'm at a loss as to what else could possibly be wrong, as the odds that I have two sets of bad hardware is insurmountable, but not implausible.