[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1911
  • Last Modified:

Raid set failure "Dell 2850"


So we have a Dell 2850, with 4 SCSI Ultra 320 disk`s.

2 x 36 GB in raid 0 with windows 2008(Dont ask me why raid 0 was chosen in the first place)

2x 300 GB in raid 1 for data.

So 1 of the 36 gb`s disk crashed and the server went down. We have then tried different things to bring it backup again. This involved in taking one of the data disk and put into another server(Dell 1750)  to copy data. That did not work

So we got 2 other drives( 36 gb) removed the 2 x 300 gb disk and installed windows 2003 on the dell 2850.

When finished, we put the 2 x 300 gb disk back into server, and thought now they should work fine. We haven`t touched anything in the raid controller setup.
But for some strange reason they fail.

So question is now, what to do with those 2 x 300 gb disk. They have alot of important data.
They are not broken or anything like that.
Its very strange, since it`s the same server they where in before, and no config was changed.

Could I follow this guide taken from another issue and try it without any risk?

You can do the following with NO harm to the data - If we get a config, we have a much higher chance of getting this to work:

Power off the server - unseat all drives a few inches.

Boot the server - hit Ctrl+M when prompted
Goto Configure > Clear Configuration (with no drives in - *stressed*)
Accept any changes
Then goto objects > adapter > (select adapter if selection screen comes up) > Set Auto Rebuild to DISABLE (after this entire process is all done we need to make sure to set this back to enable)
esc out of the controller BIOS.
Power off the server.

Put the drives in the same slots they were in before with the server OFF.
Only put in the original drives - do not put in "new" drives

Boot up the server - goto Ctrl+M

Goto configure >view add configuration

Hopefully you will get an NVRAM mismatch error
If you do you want to select take the configuration from the DRIVES

Let me know what you see.

quote end.
2 Solutions
This is the correct process, however this should not be done without Dell on the phone!
The above process is true;
But before to that ,(Try this) did you use one of these 300GB data disk in place of OS disks, in case avoid that used disk and try with unused  single disk 300gb disk,(Raid 1 is mirror, so you will have all data same as other disk. One disk should satisfy the need.  If this comes up the try the other one.
NordicitAuthor Commented:
Okay so talked to dell and followed the guide. But with no luck, the 2x 300 Gb disk are stil listed as "Fail"

When I took the config from the drives, they where already listed there as "failed", so that might be the reason. But dell then suggest the Re-Tag option, but I am a bit scared of that one hehe.

It seems that the controller is lacking an option called rescan, or reset. Because I know those drives are alright, now I just need the controller to the light aswell.

Any other ideas to fix this?

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

NordicitAuthor Commented:

No I used 2 x 36 gb for OS. and also for the new install.

The 2 x 300 gb are untouched.
I need some clairification on this, the 2x300Gig drives are setup as a Raid 1 Mirror - Correct?
If they are (or were) then you need to re-tag this array to redefine the raid configuration to the controller but NOT THE DRIVES.
The Re-Tag process re-creates the Raid configuration - the key is that YOU DO NOT INITIALIZE the array.
If this is a Raid 1 then you should boot this array with only 1 of the 2 disks online
NordicitAuthor Commented:
Yes They are in raid 1(They where when the server went down)

Okay I would try it tomorrow morning, with just 1 disk in the server.

Should the Re-Tag process still work with only 1 disk in the server(300 gb)?

You need to have both disks in the server to Re-Tag, you should have Dell on the phone when doing this. After the retag you set one of the disks to offline.
WOOOOO the memories. I worked for Dell doing exactly this thing, at least once if not several times a day. There is plenty of room for losing data here. Luckily it's a RAID 1, so you get two chances. A few things I would say:

1) Technically, the data could be considered lost at this point. If you start with that expectation, there's many chances for an optimistic recovery.

2) The process is called re-tag. That is, rewriting the metadata about the array to ensure it's correct. If when you put the disks in, the PERC BIOS shows them as A00-00 OFFLINE and A00-01 OFFLINE, you do NOT need to retag. Simply go into physical disk management and bring the first disk online (ONLY the first disk). Escape out of that screen and go into virtual disk management, and you should see that the RAID 1 is in a degraded state. That's what you want. Bring it online and see if things work. If you get a chkdsk, run through it. Make sure the data is what you're looking for. If things are corrupt, go back into the PERC BIOS and offline the first disk. Set the second disk to online, and try again. If both are corrupt, you're up a creek, and your data really is lost.

WHY we only have one disk online at a time: Both disks failed, and maybe "at the same time," but one failed first, even if only a microsecond beforehand. That means that parts of one disk may be inconsistent with the other disk. Having one disk online at a time will ensure that you get consistent data. If one disk has too many bad sectors, the other disk may have recoverable data in those spots.

3) If in the PERC BIOS, the disks don't show up as configured at all, you'll need to re-tag. It's easy - create a new virtual disk like you're configuring it from scratch. DO NOT INITIALIZE, or you will lose data. Once the virtual disk is created, go into physical disk management and offline the second disk. Boot up, see if things look good. If not, switch which disks you have online, making the first one offline and the second online. Try again. If both are corrupt, you're up a creek again, and your data is lost.

When you reach your data - using OpenManage Server Administrator, export your controller log. Export it from Storage > PERC4 > Information/Configuration > Export Log. It will probably save in %WINDIR% and be called LSI_something.log

Put that on here, and I'll look to see what could have caused the problem. No guarantees. The things i'd be on the lookout for are firmware levels - there were some that are just HORRIBLE. Afterwards, you can rebuild that second disk, or replace it as need be. Don't put the server in production until you know if you're on the healthy disk or not - you stand to lose data.

All of this said - if you have a contract for Dell support, make extensive use of it. If you're out of contract, hopefully this can help.
NordicitAuthor Commented:
Interworks: Thanks.

But in the PERC Bios, the disk are listed as "FAIL" and not offline/not shown.

So Re-Tag must be the only option..?

I will give Dell a call again.
NordicitAuthor Commented:
After a phonecall to dell, the solution was provided with a FORCE ONLINE on one of the disks.

Thanks very much for the help.

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now