Link to home
Start Free TrialLog in
Avatar of Mark Galvin
Mark GalvinFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Dell PowerEdge 2950 crashing on a weekly basis

Client has a Dell PowerEdge 2950. Running Server 2008 R2 and running as a SQL Server (2012).

A month ago users reported that apps (which use that server as DB back end) were running slowly. When we checked the console, it was totally unresponsive. We could ping it by name & IP and got a response. But we were not able to do anything else. Power cycled the server and it came back up.

7 days later same thing. I was not onsite at the time so a junior member of IT power cycled and it came back up.

7 days later again. I was on site this time. It seems like that the server's performance slowly gets worse and eventually the server becomes unresponsive.  After a power cycle, I checked Dell System Admin and found that the PERC 5/i battery had failed. I know that can cause performance degradation in the RAID Array.We replaced that and after 48 the battery reported as being active.

I had hoped that, that would have stopped the symptoms. But it happened again yesterday. I was not on site so the server was power cycled.

Tonight I have run Windows Update. There were about 17 updates. All installed and server rebooted.

I have checked the server for tasks that run on a regular basis (such as AV etc.) and there is nothing that corresponds with the Tuesday/Wednesday (it happens on either on a Tuesday or Wednesday, each week for the last 4) performance degradation and subsequent unresponsiveness.

Firmware on PERC is up to date. BIOS is up to date.

First, any one have any thoughts on preemptively troubleshooting the issue?

Second, any one have any suggests for some basic monitoring (Windows out of he box stuff) to keep on eye on what is happening so that I can check through when the next crash takes place. Lots of other server in the doamin that can be used to store logs etc.

Thanks
Mark
Avatar of Scott Silva
Scott Silva
Flag of United States of America image

If this is an older server (a few years) I would open it up on your next maintenance window and do some house cleaning.... Remove any dust... Pull and reseat any cards, memory and connectors a few times to clear any oxidation...

Also make sure all memory is in recommended slots and is the right type and speed, and each slot pair has matched memory.
The slot diagram should be on the inside of the cover.

If you still have problems I would check processor heat sink compound.

It could just be too little memory for the task, and the server slows as the cache ages and fills...
SOLUTION
Avatar of nobus
nobus
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mark Galvin

ASKER

It started happening again today. I was able to get onto the server remotely and found that the W3WP.exe process was not releasing memory after a user had run a Reporting Services Report. I changed the Recycling settings in IIS to be after two request rather than the standard 1740 minutes (29 hours).

I suspect that on the days the servers performance degraded that a number of users were running reports. I noticed today that after one report the process hung onto between 1 & 2 GB of memory. Theres only 32GB in there so I'm not surprised that the server was crashing.
tx for feedback