[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Problem with Hardware or Memory leak? (SATA RAID1 "dirty" after RAM swap)

Posted on 2005-04-10
15
Medium Priority
?
295 Views
Last Modified: 2010-04-03
Hello Experts.

I have a server that has two RAID1 arrays. In other words, 4 SATA drives make up 2 windows volumes. (C: and D:). D: is the data volume, and is 70 GB, and the other volume is 30 GBs (or so).

This server randomly freezes and I have been trying to understand why. I went to replace the RAM and after shutting down and unplugging it, and swapping the RAM, then re-booting, the RAID status screen displayed a critical message the 70 GB data RAID array was only operating on one drive! The client had said that this happened before when the server would freeze, but not EVERY time the server froze. After the OS booted (win2003 server), everything looked good, and I looked at the RAID status utility and the data (70 GB) RAID1 was rebuilding and was at 20%. I have never seen behavior like this before.

As you can probably guess by now, the new RAM did not fix the freezing, and so now I am left wondering if it’s hardware or software related. When the server was being built, it ran fine for several days before being put into a production environment, so the possibility of a memory leak from an app is there, but I wanted some insight regarding the RAID1 being “dirty” only after a RAM swap out.
Could this mean that there is a problem with a drive, or the RAID controller? The SATA drives are in a hot-swap “bay” with a hot-swap backplane (Intel components).
Has anyone run into this?

Thanks in advance for any help in this matter.
0
Comment
Question by:talkingbob
13 Comments
 
LVL 6

Expert Comment

by:gjohnson99
ID: 13750527
I lock up like this will most likely cause the the raid failure 90% of the time.

check logs for errors. Could be a driver are software  

0
 
LVL 88

Expert Comment

by:rindi
ID: 13750819
Run memtest86 (http://memtest.org). If you don't get an error on the first pass, run at least 5 passes.

If the RAM is OK, try updating the firmware of your raid controllers.
0
 
LVL 93

Expert Comment

by:nobus
ID: 13750900
if you have a spare drive, you can swap out one at the time, and test them all like that
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 4

Expert Comment

by:reedsr
ID: 13751809
what RAID controllers are you using ?
0
 

Author Comment

by:talkingbob
ID: 13752081
Promise* PDC-20319 Serial ATA RAID is the controller. It's integrated on an Intel S875WP1-E server board.

This is the most recent event log error:
ID: 119
The driver for device \\device\harddisk1\dr1 elayed non-paging to requests for 0 ms to recover from a low memory condition.


ID: 2019
The server was unable to allocate from the system nonpaged pool because the pool was empty.

ID: 1001
The computer has rebooted from a bugcheck....


Hope this helps.
0
 
LVL 88

Expert Comment

by:rindi
ID: 13752139
Check your memory.
0
 

Author Comment

by:talkingbob
ID: 13761303
I DID replaced All the RAM that was in the server with new sticks and the same problem happened. I thought this rulled out the chance that the memory was bad.
0
 
LVL 88

Expert Comment

by:rindi
ID: 13761540
No, not necessarily. RAM is quite often bad. What is bad quite often too are the sockets for your RAM, so it also often helps if you just try another slot or if you reseat the ram.
0
 

Author Comment

by:talkingbob
ID: 13780976
UPDATE:

I went over to the server again on the 13th and change the RAM to a different slot. This time even after a warm reboot, the system came up saying that a drive in the data RAID 1 array was critical. I watched this drive rebuild itself, then the serverfroze for a moment, and then the drive was gone and the array was set from rebuilding to now critical. Rebooted again, and it started rebuilding from 0%.

I hope this is the root of all other problems, and yes another drive is on its way.

We'll see...
0
 

Author Comment

by:talkingbob
ID: 13828311
Ok,

The drive was replaced, and the errors have gone away (in the RAId event log). It is still locking up though and I may have found the reason why.

The Promise RAID Management utility (PAM) had a known memory leak issue with the version I was using. Upgraded to version 4.0 and then diabled the PAM service. It has not locked up in 60+ hours, so I'm hoping that that did it.

We will see...
0
 

Author Comment

by:talkingbob
ID: 13987835
The problem was solved, but I arrived at a solution through my own research and no expert comment helped.

I guess points need to be refunded.
0
 
LVL 88

Expert Comment

by:rindi
ID: 14238713
I suggest a PAQ/Refund, not a Delete/Refund, as the user provided the answer and this could be usefull for the future.
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
ID: 14264991
PAQ'd, 266 points refunded.
CetusMOD
Community Support Moderator
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The question appears often enough, how do I transfer my data from my old server to the new server while preserving file shares, share permissions, and NTFS permisions.  Here are my tips for handling such a transfer.
Article by: evilrix
Looking for a way to avoid searching through large data sets for data that doesn't exist? A Bloom Filter might be what you need. This data structure is a probabilistic filter that allows you to avoid unnecessary searches when you know the data defin…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Suggested Courses
Course of the Month19 days, 4 hours left to enroll

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question