Problems with Microsoft Server 2003 R2 after RAID5 trouble

In my HP DL380 server I have RAID5 with hotspare 3+1 - recently one of working disk crashed - i rebooted the server and activated hot spare. RAID synchronized after few hours, and everything seemed to be working, but there are some performance problems.
Perfmon shows graphs like  .|.|.|.|.| ( 0 to 100%) on disk queue when MSSQL is "waiking up" for example when someone launches app for the first time or when using Sharepoint. After 5-15 minutes it stops peaking and stays at normal low level, but those peaks keep appearing several times during day.
The second problem is that after this crash HP Backup Software ( Backup Express ) is showing errors and it cannot backup files on drive C: .
Today even DHCP service had some problems.

Is it safe to run chkdsk on running server and then maybe sfc /scannow ?
Or maybe someone knows what happened?

Logs show nothing
.

LVL 1
shlafrockAsked:
Who is Participating?
 
mlongohCommented:
You get an error message every time you reboot the server (presumably when the RAID controller is initializing) and there's a fault error light?  There are a number of possiblities - the top three are some bad sectors, bad drive, or bad controller.  I'm hoping it's just some bad sectors and that once the sectors have been overwritten (I'm presuming they've been re-mapped already) that the light will go out and the message will discontinue.

Have you attempted to use the management interface that's available at boot up to see if you can clear the error or investigate further?  Usually (can't state for certain) there's a function/hot key when the controller is inititializing that allows you to do some basic/simple raid managmement.

Is the server still under warranty?
0
 
mlongohCommented:
Why did you have to reboot the server?  If you had a RAID5 array with a hot spare, the spare should have kicked in to play when the drive failed without really impacting much except performance during the sync of the spare drive.
0
 
rindiCommented:
chkdsk should be run while the partition you are checking is offline, that means it is done while booting. During this time the server won't be usable, and it can take a lot of time depending on the partition size and number of errors it finds. But a chkdsk should nevertheless be done regularly. SFC /Scannow should be no problem.
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
mastooCommented:
Yes, why the reboot?  Maybe the drive failure was a controller error?  I'd check your Sql and windows event logs to see if they are still complaining about drive problems.
0
 
shlafrockAuthor Commented:
I don't know why hot spare did not activate itself when the drive failed - in HP Managment logs there was Drive needs to be replaced and the led was orange, so I rebooted the server and BIOS said the drive is unoperational or something but it still worked without hot spare, so I turned off the server, pulled out hdd and turned server on - BIOS said Do you want to activate hot spare - i clicked Y .
I've never been dealing with hot spare activation till now. Of course I thought it would activate automagically and everything would be working fine but it didn't.

Logs show nothing.
I can run chkdsk only on Sunday.
0
 
mlongohCommented:
You say the Backup Express and DCHP are generating errors, what are the errors?  Are they showing in the event log?
0
 
shlafrockAuthor Commented:
Data Protector Express error is i think : Invalid object ID
theres nothing in event log.
I'll post tommorow after I run chkdsk
0
 
shlafrockAuthor Commented:
Ok so chkdsk showed nothing.
I repleaced the drive - this time while server was on - sync started and finished after an hour or more, logs show that everything is ok, so I think everythink is fine now, except when I reboot server I see :

Slot 1 HP Smart Array P400 Controller
1716-Slot 1 Drive Array - Unregenerable Media Errors Detected on Drives
during previous Rebuild or Auto-Reliability Monitoring ( ARM) scan.
Problem will be fixed automatically when the sector(s) are overwritten.
Backup and Restore recommended.

but in HP Managment app I see that everything is fine, no problems, everything works.
The HP DL380 UID LED is constantyly blinking what is annoying, and I cannot disable this blinking because of the error above I think.


0
 
mlongohCommented:
Just a guess, but it sounds like the drive that was most recently replaced has one or more bad sectors and that they were marked bad, or you may have a faulty controller (given that you've had problem with two different drives).
0
 
shlafrockAuthor Commented:
But logs show that everything is allright.
0
 
shlafrockAuthor Commented:
I think I get it every time I reboot, I'll check this out next sunday.
0
 
rindiCommented:
Usually if SATA-II isn't specifically mentioned anywhere, it's likely that only SATA-I is supported (as at that time there was only one SATA standard out, so there was no point in further specifying the version). Also, having looked at the OS support for this PC (when downloading drivers you can select the OS), it looks like it isn't the newest systems, so it is quite possible that it only uses SATA-I.
0
 
shlafrockAuthor Commented:
.
0
 
mlongohCommented:
So what happened?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.