We have an out of warranty Dell PowerEdge 2800 server running Win Server 2003 with the following embedded RAID controller: PERC 4e/Di. Here are a few of its specs:
RAID BIOS H435 Build April 23, 2008
PERC/CERC BIOS Configuration utility U827
RAID 5
At boot we get the following RAID Post error message:
"Memory/battery problems were detected. The adapter has recovered, but cached data was lost.
Press any key to continue."
While up and running, we get various blue screens and random OS freezing and crashing.
Blue Screen
A process or thread crucial to system operation has unexpectedly exited or been terminated.
Stop: 0x000000F4 (0x00000003, 0x899A9D08, 0x899A9E6C, 0x8094C6E6)
Through multiple Dell technician phone calls, we arranged for them to send us a replacement battery for the RAID controller. We installed it, let it charge for 24 hours, but we still get the Post error and the random crashing. I have also reseated the RAM DIMM, but the problems persist. We updated the MLB BIOS as well as the PERC firmware BIOS, with the problems still persisting.
Now, we believe that the problem is a bad RAM DIMM on the controller. The documentation for the PERC 4e/Di is hard to find on Dell's website and not very helpful. I found LSI's site a bit more helpful because they give detailed manuals for their controllers, which is the platform upon which dell's PERC controllers are based. All troubleshooting there and in other places on the web all point to the battery or the memory DIMM being the problem. And since we have a brand new battery, we think it is the DIMM.
Here are the specs for the existing memory DIMM:
Samsung Part Number: M393T3253FZ0-CC
256 MB DDR2 PC2-3200R ECC Registered SDRAM MODULE 240pin Registered Module based on 256Mb F-die 72-bit
400MHz; 1R x 8
We would like to simply buy a new one, but apparently this is a very hard part to come by. Dell agreed with our diagnosis (replacing the DIMM) but they no longer carry it. I called around to our local computer shops (Altex and Fry's) and the local ones that sell white label and custom computers- nothing. I also called 4allmemory.com, but they don't carry it. Apparently, the fact that it is "Regsitered" and "DDR2" makes it hard to find.
I cam across the following thread that indicates that i can turn off the cache feature on the RAID controller, which is the only function that uses the DIMM. By doing so, the cache will no longer become corrupted because it's not being used.
http://en.community.dell.com/forums/t/17436588.aspx?c=us&l=en&cs=&s=genI mentioned this option to Dell technical support and he wasn't sure if it was safe to do so. He thought I might blow out my entire RAID configuration and force me to recover from backup tapes to a newly initialized and configured RAID volume.
However, through my research on various forums, I've seen people mention that they have turned their cache off as a matter of troubleshooting other issues (I/O speed, ie) which leads me to believe that the setting changing would not kill my configuration.
Does anyone have any experience with this? Specifically, can I change my cache settings within the PERC manager and NOT have to lose all of my RAID configurations (and be forced to format and reinstall)?
Thanks in advanced!