asked on

Problem with Hardware or Memory leak? (SATA RAID1 "dirty" after RAM swap)

Hello Experts.

I have a server that has two RAID1 arrays. In other words, 4 SATA drives make up 2 windows volumes. (C: and D:). D: is the data volume, and is 70 GB, and the other volume is 30 GBs (or so).

This server randomly freezes and I have been trying to understand why. I went to replace the RAM and after shutting down and unplugging it, and swapping the RAM, then re-booting, the RAID status screen displayed a critical message the 70 GB data RAID array was only operating on one drive! The client had said that this happened before when the server would freeze, but not EVERY time the server froze. After the OS booted (win2003 server), everything looked good, and I looked at the RAID status utility and the data (70 GB) RAID1 was rebuilding and was at 20%. I have never seen behavior like this before.

As you can probably guess by now, the new RAM did not fix the freezing, and so now I am left wondering if it’s hardware or software related. When the server was being built, it ran fine for several days before being put into a production environment, so the possibility of a memory leak from an app is there, but I wanted some insight regarding the RAID1 being “dirty” only after a RAM swap out.
Could this mean that there is a problem with a drive, or the RAID controller? The SATA drives are in a hot-swap “bay” with a hot-swap backplane (Intel components).
Has anyone run into this?

Thanks in advance for any help in this matter.

gjohnson99

I lock up like this will most likely cause the the raid failure 90% of the time.

check logs for errors. Could be a driver are software

rindi

Run memtest86 (http://memtest.org). If you don't get an error on the first pass, run at least 5 passes.

If the RAM is OK, try updating the firmware of your raid controllers.

nobus

if you have a spare drive, you can swap out one at the time, and test them all like that

reedsr

what RAID controllers are you using ?

talkingbob

ASKER

Promise* PDC-20319 Serial ATA RAID is the controller. It's integrated on an Intel S875WP1-E server board.

This is the most recent event log error:
ID: 119
The driver for device \\device\harddisk1\dr1 elayed non-paging to requests for 0 ms to recover from a low memory condition.

ID: 2019
The server was unable to allocate from the system nonpaged pool because the pool was empty.

ID: 1001
The computer has rebooted from a bugcheck....

Hope this helps.

rindi

Check your memory.

talkingbob

ASKER

I DID replaced All the RAM that was in the server with new sticks and the same problem happened. I thought this rulled out the chance that the memory was bad.

rindi

No, not necessarily. RAM is quite often bad. What is bad quite often too are the sockets for your RAM, so it also often helps if you just try another slot or if you reseat the ram.

talkingbob

ASKER

UPDATE:

I went over to the server again on the 13th and change the RAM to a different slot. This time even after a warm reboot, the system came up saying that a drive in the data RAID 1 array was critical. I watched this drive rebuild itself, then the serverfroze for a moment, and then the drive was gone and the array was set from rebuilding to now critical. Rebooted again, and it started rebuilding from 0%.

I hope this is the root of all other problems, and yes another drive is on its way.

We'll see...

talkingbob

ASKER

Ok,

The drive was replaced, and the errors have gone away (in the RAId event log). It is still locking up though and I may have found the reason why.

The Promise RAID Management utility (PAM) had a known memory leak issue with the version I was using. Upgraded to version 4.0 and then diabled the PAM service. It has not locked up in 60+ hours, so I'm hoping that that did it.

We will see...

talkingbob

ASKER

The problem was solved, but I arrived at a solution through my own research and no expert comment helped.

I guess points need to be refunded.

rindi

I suggest a PAQ/Refund, not a Delete/Refund, as the user provided the answer and this could be usefull for the future.

ASKER CERTIFIED SOLUTION

CetusMOD

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Problem with Hardware or Memory leak? (SATA RAID1 &quot;dirty&quot; after RAM swap)

Problem with Hardware or Memory leak? (SATA RAID1 "dirty" after RAM swap)