Solved

Dell Power Edge 2950 Raid 1 degraded and won't come back online

Posted on 2011-09-08
55
1,566 Views
Last Modified: 2012-05-12
We have a Dell PowerEdge 2650 server with a failed RAID 1 drive. The drive has been removed and a new initialized drive has been put in its place. When I open server administrator the virtual disk is shown as 'Degraded' and it is not rebuilding. There is no disk activity at all. I have already tried to make the new disc a hot spare, but it still says ready and won't rejoin the virtual disc? I don't see a way to force the rebuild, there is no option like there should be? Does anyone have any ideas?
0
Comment
Question by:glennx
  • 30
  • 11
  • 6
  • +2
55 Comments
 
LVL 47

Expert Comment

by:dlethe
ID: 36504580
Is the capacity of the replacement disk >= capacity of the one it replaced?  I.e, same make/model/firmware?
0
 

Author Comment

by:glennx
ID: 36504612
yes it is
0
 

Author Comment

by:glennx
ID: 36504634
the only difference was that I took it out of an identical server, wiped the disc and then replaced it with the bad disc in my degraded server. So the drive wasn't brand new but definitely worked.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 36504745
that is the problem.  It has metadata on it from previous controller. You're lucky you didn't cause it to get confused.

Yank it out, put the disk is a NON-RAID controller and blow away both the first 128 KB at block zero and last 128KB at end of the disk, or just wipe it completely.
0
 

Author Comment

by:glennx
ID: 36504782
I did first.  I put the replacement disk in a different server and reformatted it first, then popped it in where the failed drive was in my primary server..  Shouldn't that have done the trick?
0
 
LVL 13

Expert Comment

by:khairil
ID: 36504823
Hi,

I think what dlethe want to say is put the disk back to it original server. Start the server, on the RAID controller setup, move the disk out from logical RAID to that it become stand alone without be member to other logical RAID setup. You may wish to format the disk after that.

Just make sure the disk is not a member of any RAID.
0
 

Author Comment

by:glennx
ID: 36504861
I will try to take the wiped drive out, and place it back in the original server, then try to take it out of it's raid.  Then see if it will rebuild itself onto the primary server.

I will let you know how that goes.  It's been formatted already so I shouldn't have to re-format it again.

Does metadata get removed when you boot the server to the perc 3/di controller then use disc utilities to format the disc or is their another way?  I did a low level format.
0
 

Author Comment

by:glennx
ID: 36504988
no go guys that didn't work either?
0
 
LVL 63

Expert Comment

by:SysExpert
ID: 36505155
You need to use a program that does a full zeroing out of the first cylinder so that no meta info is left

Download the free UBCD as it has disk wipe utilities to do this


 UBCD Free with Disk tests and Vendor utilities
www.ultimatebootcd.com

I hope this helps !

0
 
LVL 63

Expert Comment

by:SysExpert
ID: 36505173
Darik's Boot and Nuke to 2.2.6 beta (rebuilt with isolinux V3.86 to improve compatibility). is on the UBCD and that is what should be used in most cases
0
 

Author Comment

by:glennx
ID: 36505184
so I was able to get disc management to see it.  It's showing up as drive F now.  In the physical disc section of server administrator, both discs are showing up as online.  This looks better, however the virtual disc still shows as degraded.  My fear is that the new drive is still not part of the mirror.

Any ideas would be appreciated?
0
 

Author Comment

by:glennx
ID: 36505192
sorry, needed to refresh my machine.  I will try nuking the disc and see what happens.

Thanks...
0
 

Author Comment

by:glennx
ID: 36506137
I'm getting the same message still guys?  The physical disc says online, but the virtual says degraded?
degraded-virtual-mirror.docx
0
 

Author Comment

by:glennx
ID: 36511083
If anyone can help I would greatly appreciate it?
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36512712
glennx ... on a PERC 3/Di, it should be sufficient to Initialize the new disk, then assign it as a Hot Spare.  To clarify another question ... there is no Rebuild option for a READY disk - you can only Rebuild a FAILED disk - assigning a drive as a Hot Spare will cause a READY disk to begin a Rebuild into the array.

You say that both drives are ONLINE now, but the VD still shows Degraded?  In OMSA, go to Storage, PERC, Information/Configuration (link at top of page), and select Rescan from the dropdown menu of Available Tasks for the controller.
0
 

Author Comment

by:glennx
ID: 36512713
Thank you very much.  I am almost a newbie here, so that was very helpful.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36512720
How about a screenshot of your Virtual Disks screen too?
0
 

Author Comment

by:glennx
ID: 36512726
I will try your suggestion and let you know in a few mins.

Thank you.
0
 

Author Comment

by:glennx
ID: 36512733
please see attached...
degraded-virtual-mirror.docx
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36512746
Not your Physical Disks screen, but your Virtual Disks screen.
0
 

Author Comment

by:glennx
ID: 36512763
scroll down to the second page in the doc
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36512778
Ah ... didn't think to look for a second page.  Let us know what happens with a Rescan.
0
 

Author Comment

by:glennx
ID: 36512789
Will do and also i did assign it as a hot spare originally, but it never rebuilt and i never saw the option to rebuild...  Strange??

I'll let you know after a rescan
0
 
LVL 63

Expert Comment

by:SysExpert
ID: 36512913
Not familiar with Perc 3, but is it possible that while it says degraded, it is actually rebuilding ?

0
 

Author Comment

by:glennx
ID: 36512923
It's only a 33gb drive and it's been like that for over two days so don't think it is rebuilding but not sure how to tell?
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36512955
It will say something like reconstructing or rebuilding with a percent-complete.  You could post a controller log ... OMSA, Storage, PERC, Information/Configuration, Export Log, that should shed some light on when it actually completed ... and possibly even what else is wrong.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:glennx
ID: 36513024
It doesn't say that but I am trying to reinstall openmange now because I tried updating it to see if I could get any help.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36513057
Well, if it is already rebuilt, it won't say that now.  It would have said that for the hour or two it was actually rebuilding.

Before trying to reinstall OpenManage, did you run the Rescan?
0
 

Author Comment

by:glennx
ID: 36513091
yes, but the drive still said ready under physical and degraded under virtual????  What do you think I could try next?
0
 

Author Comment

by:glennx
ID: 36513101
sorry not ready, i mean online under physical
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36513130
Sorry, just want to be clear here ... does the drive say READY or ONLINE (you and the screenshot said ONLINE - there is a big difference)?  

After selecting Rescan, it will ask you twice (I think - at least once) to confirm you want to do a Rescan.  The Rescan should take at least 10 seconds (up to 60).  If you don't think it ran like that, run it again.

Export a controller log and post it here.
0
 

Author Comment

by:glennx
ID: 36513148
let me try again, thx!
0
 

Author Comment

by:glennx
ID: 36513276
So now another problem arises.  My server now created another degraded raid with the second drive?

Please help?  See screen shots!
screenshots.docx
0
 

Author Comment

by:glennx
ID: 36513293
also log file if that helps
afa-0909.log
0
 

Author Comment

by:glennx
ID: 36513328
Under physical disks they say "ONLINE"

Under Virtual there is now 2 "Degraded Raid1"
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36513366
I don't know when the first screenshot was created, but the second screenshot shows you did not rebuild the replacement drive into your RAID 1, but created a new/separate RAID 1 with that drive (or if it was part of a RAID 1 in its previous machine, its configuration was imported, showing that RAID 1 on this machine).

Delete the last RAID 1 Virtual Disk, Initialize it, then assign it as a Hot Spare/Failover.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 36513413
Sorry i didnt get back earlier, was out.

You can NOT wipe metatdata out form behind a PERC controller.  As I wrote, "it  MUST be a NON-RAID controller."  I emphasized that for a reason. The PERC controllers will not let you low-level initialize the hidden area. If you put that disk back in the original, or even another PERC controller, you make things worse and put all your data at risk.  You would have had no problems getting it to work.

Frankly at this point with all the things that have happened, I would strongly consider taking a binary image before doing anything else. You must NOT do it with the PERC, you need a NON-RAID controller.

You don't have a quorum anymore, so you could very well be writing info from the replacement disk onto the new one & your existing metadata is screwed up. The safe thing to do is use another machine with scratch disk, buy a copy of runtime.org's  raid reconstructor (this is easy, but not necessarily cheap) .. take a binary image of the data and save it to a local drive.  Then reformat the 2 data disks, stick them in the PERC, build a fresh new raid.  Then install the RAID controller + 2 drives into the machine you licensed runtime raid reconstructor two, then image the backup of your data onto the logical RAID1.

Above is a sure thing.  (Provided you didn't muck up your data already). Other techniques may work, but this will work.

 
0
 

Author Comment

by:glennx
ID: 36513423
Ok, I did this and we are back to square one again.  Please let em know next steps.  See screenshots

Please let  me know next steps?
degraded-virtual-mirror2.docx
0
 

Author Comment

by:glennx
ID: 36513441
I ran DBAN on the disc before I installed it the second time yesterday.  Would that have striped the hidden area on the  disc down or would it still matter?
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36513454
You can follow dlethe's plan - which may be your only option later - but what would be the normal practice, if you were just coming into this scenario, would be to insert the drive, Initialize it (from Physical Drives dropdown), then Assign it as a Hot Spare.  If that doesn't work, you have something else going on preventing the rebuild and restoring from backup on a fresh array may be your next step.
0
 

Author Comment

by:glennx
ID: 36513544
OK guys thanks for all your help.  We ordered a brand new drive as a replacement instead of using an old drive etc..  

I am going to install it Monday morning, initialize, assign hot spare, and and see if it rebuilds.

Thanks for all the help, I will keep you all posted!
0
 
LVL 47

Expert Comment

by:dlethe
ID: 36514016
Look, you have NVRAM in the controllers and metadata in a reserved area on your disk drives.   The metadata contains things from known bad blocks, to even start/end of logical data, as well as RAID parameters and device names. Plus you could be dealing with different record layouts because the metadata could have different revisions.  

So, what can I say, I'm being conservative.  The only absolute is that it is messed up.  I just spent last weekend with a client who had a LSI-based controller (different model) and fibre channel drives that basically went through this swapping & cross your fingers process due to busted metadata.  They lost one volume because the metadata from the other drives 'conspired' to override a good config with a stale one,  and it overwrote a disk with live data with mirrored data from another controller.  

Make a proper backup with a NON-RAID controller just in case.  I say 4:1 against having any problems, but if this was my data, I would invest the time to make an image copy.  You could have a list of blocks that are queued up for writing that must not be written, or it will damage live data, or damaged metadata, or a bunch of other things I don't need to get into.

Bottom line, at this point  take a image clone as a parachute, then slap in the virgin replacement & cross fingers, but then you can get some data corruption that you have no way of anticipate w/o access to some of the things I have.   Worse, next time you have a drive failure then the controller could make an incorrect decision based on wrong internal drive mapping, then it comes back to haunt you.

Best practice, if you value your data is to blow a way the LUN, build & initialize a new RAID1, then restore.

0
 

Author Comment

by:glennx
ID: 36523892
Guys,

New drive came in today.  I inserted it into the server, initialized the disc, and then assigned as a global hot spare.  Same result, the drive just reads ready and did not rebuild and assign it self to the degraded virtual array.

At this point you all recommend that I should just take an image of the OS and blow away and reconfigure a new array and then drop the image back down?

Also we don't have an image program such as Ghost ect...  Do you know of any freeware products I can download?

Thanks again...
0
 
LVL 32

Accepted Solution

by:
PowerEdgeTech earned 125 total points
ID: 36524009
At this point, things are not working as designed/intended ... so yes, I would recommend restoring from a backup on a freshly reconfigured array.
0
 
LVL 13

Expert Comment

by:khairil
ID: 36524108
you can try clonezilla, http://clonezilla.org/
0
 

Author Comment

by:glennx
ID: 36524182
Ok I will give it a shot and let you know.

Thanks again...
0
 
LVL 63

Assisted Solution

by:SysExpert
SysExpert earned 125 total points
ID: 36524757
this may be a little easier

 http://redobackup.org/
0
 

Author Comment

by:glennx
ID: 36524812
Great Thanks a lot!
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 250 total points
ID: 36524917
you should really consider calling in professional help at this point.  clonezilla will image but it won't take you to the next step, like being able to use the data.  (plus it might stop on bad blocks and you need to keep track of them to minimize data loss)
0
 

Author Comment

by:glennx
ID: 36525032
I have a second raid 5 array that is storing all data. This server is just our primary file server, so I will have no problems rebuilding that if need be.
0
 
LVL 47

Assisted Solution

by:dlethe
dlethe earned 250 total points
ID: 36526575
just rebuild. better known good stale data the unknown partially corrupted data.
0
 

Author Closing Comment

by:glennx
ID: 36560937
Thanks for all your help guys, deleting the array and restoring from REDO backup was the key.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 36560980
Glad you're back up and running :)
0

Featured Post

Free camera licenses with purchase of My Cloud NAS

Milestone Arcus software is compatible with thousands of industry-leading cameras for added flexibility. Upon installation on your My Cloud NAS, you will receive two (2) camera licenses already enabled in the software. And for a limited time, get additional camera licenses FREE.

Join & Write a Comment

Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
If you're not part of the solution, you're part of the problem.   Tips on how to secure IoT devices, even the dumbest ones, so they can't be used as part of a DDoS botnet.  Use PRTG Network Monitor as one of the building blocks, to detect unusual…
Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now