DELL NX3000 Disk Light

I have a DELL NX3000 storage server running W2008 Storage Server.
It has four (4) 750Gb drives in a raid array.
Three of the drives have a green light that stays pretty constantly on.
The Fourth drive's light alternates off, green then orange.
I have no error reported and the blue (all is good) light stays on.

Should I be worried that this drive is failing?
Experts---1.jpg
LVL 2
hgj1357Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

hgj1357Author Commented:
Raid controller is DELL PERC 6/i Integrated RAID Controller
0
hgj1357Author Commented:
0
hgj1357Author Commented:
What is the max number of drives this controller can handle?  Can I add two additional and increase my storage, or would that necessitate a new controller?
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

DavidPresidentCommented:
The drive sent out an alert that the S.M.A.R.T. predictive failure is imminent.
Order a replacement now.  Backup now, and replace the drive.    SMART is not infallible, but it is designed to give you a 24-48 hour notification of impending doom.
0
hgj1357Author Commented:
I vaguely remember the array kinda rebuilds itself.  Do I need to do much to help it along?
0
DavidPresidentCommented:
Note  -- best practice is to do full backup, then bring system to BIOS and replace the HDD with the new one, and let it rebuild during the week-end or scheduled downtime).

If you can't do that, then just put the replacement in a free slot and configure it as a hot spare, and take daily backups.   The disk itself either reported that failure is imminent, and/or the disk timed out or had some other problem where the controller flagged the disk as untrustworthy).

Better to schedule a controlled replacement during a downtime window, then have it crash or worse, lose data.  If all your disks are 3-5 years old and they were all bought at the same time, you might consider getting more spares or just replacing them all.
0
DavidPresidentCommented:
If the system has rebuilt the array before and you never replaced disks, then go shopping for replacement disk(s).  Your data is at risk.  For all you know several disks are in stress and a rebuild could very well just make a 2nd HDD fail...

As you probably know, if you lose 2 disk drives in a RAID5, then you have 100% data loss.
0
hgj1357Author Commented:
So the controller can handle more than four drives then?

I just looked at  the open manage server administrator on my other server and it is reporting that three physical disks are "online" and the fourth is "Foreign" .  What does that mean?
0
DavidPresidentCommented:
The controller can handle 32 disks.   Of course you also have to have that many HDD bays and sufficient power, but the controller will let you do it.

The 4th disk is foreign???  Then it is NOT part of the array.  You have NO redundancy. Next error means data loss.  The drive failed, and the controller bounced it out of the array.  The HDD is spun up now, so it is showing OK.


Back up now.
(I am going under assumption that you have a 4-disk RAID array, and this disk isn't a hot spare).   You'll have to go to the configurator LOGICAL array config and see health and config of the arrays to verify.
0
hgj1357Author Commented:
configurator LOGICAL array config?   How do I access this?

I think all drives are in the array.

On this server each drive is 465Gb and there are four
465 x 4 x (1-1/4) = 1395  and that is the space windows reports as available.

This is odd.
0
DavidPresidentCommented:
Then you have a 4-disk RAID5, where one disk is no longer part of the array. Next HDD dies is 100% data loss.   If you have just one unreadable block on the surviving disks, you have partial unrecoverable data loss.

A foreign disk is one that is not part of the array.  it may be online now, but only online in the sense that it is drawing power.  It isn't participating in the I/O.
0
hgj1357Author Commented:
There is an option to clear the foreign state. From my old friend google:
-------------------------------------------------
To clear the Foreign flag from the drive, you will go into OMSA ... Storage, PERC 5, Information/Configuration link at the top of the page, Clear Foreign Configuration from the drop-down menu of available tasks.  Do this ONLY when the system is running fine and there is only one drive showing Foreign.  If you have more than one disk showing Foreign, then you have to Import the Foreign Config.  Anytime there is a config/timestamp different from the controller's config, it will show Foreign, so if it is only on a single disk, clear it and move on.

I would test your original drive, as it may not be bad.  Make sure your system firmware (BIOS, ESM, RAID) and RAID driver are up to date ... this can help prevent Foreign configs.
-------------------------------------------------
0
DavidPresidentCommented:
The drive failed. it needs to be replaced. if it didn't fail there would be no foreign state.
Testing the HDD for suitability in a RAID controller is something that you can't do without the right software.  Such software isn't available for free, and quite expensive.  

Now you can test the HDD and if the test fails, then the HDD is no good.  But if the test passes, that doesn't mean the HDD is good & has appropriate operational characteristics for that controller.  It just means it passed the test you ran.  

It's your data, I'm telling you to back up and replace the disk.   Good luck to you.  Nothing more I can add to this thread.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
hgj1357Author Commented:
How do I replace the drive?  Do I do it hot while the server is running using the openmange server admin app, or do I need to reboot and use the raid controller config mode that looks a bit like the bios setup?
0
DavidPresidentCommented:
Do it hot. It is designed to be done that way.  Rebuilding will be automatic.
0
hgj1357Author Commented:
So I use openmanager to make the dodgy drive blink.
Identify it.
Pull it out.
Pop the new one in.
And that's it?

I'm dubious!
0
DavidPresidentCommented:
That is it but FIRST do a full backup.  Rebuilds are stressful. If you lose a drive during the rebuild, all of your data is lost forever.
0
hgj1357Author Commented:
I have replaced a disk on the NX3000. It is the same model number as the DELL drive but is not from DELL.  State reads Ready, but it is not rebuilding.  Can I force it to rebuild?
0
hgj1357Author Commented:
OK.  Didn't wait long enough it is rebuilding
0
hgj1357Author Commented:
NX3000 is re-built.  One comment I will add. Always run "consistency check" before pulling out a drive for replacement.  I almost had a disaster. There was an un-reported failed drive as well as the pred-fail drive.
0
hgj1357Author Commented:
One comment I will add. Always run "consistency check" before pulling out a drive for replacement.
0
DavidPresidentCommented:
More important ... run a consistency check weekly as part of the care and feeding of any RAID array.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Hardware

From novice to tech pro — start learning today.