asked on

Can't delete a SAN array created by a corrupt drive

We received a replacement drive last week for a bad drive in our SAN. The slot the drive was in wasn't configured yet. Turns out the drive (refurbished obviously) had data still on it. No big deal right?

This logical drive that existed on the drive also happened to use the same LUN as one of our existing drives. I kept getting errors when I tried to delete the bad drive. When I tried to remove the mapping it took with it the mapping for our existing drive. Fortunately that server wasn't online. To meet a deadline, I recreated the previous array ...with a different LUN this time. The drive is no longer there, but...

Now (then too) I have no option to delete this array. The option is grayed out. I'm not sure what to do at this point. A third party support representative said we would have to rebuild our entire SAN. (Oh joy!!) It contains several severs so this would be quite a "recovery" job for a single faulty drive from a supplier.

I know it should be tested when possible, but I had never imagining it would corrupt a SAN configuration. Has anyone else been through this? Our support company has offered to compensate us. It created quite a problem obviously, or else I wouldn't be here. Not to mention the time it took to recreate a new array.

Oh, we're using IBM Storage Manager.

Thanks

ASKER CERTIFIED SOLUTION

David

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

David

P.S. there are several controller options for the DS4000. If you are using the Mylex-based engine, then you can configure the RAID to tie a specific LUN# to any specific WWN. If the option to delete it is grayed out, then you most likely have corrupted metadata. This *CAN* be fixed, but you need a person with skills beyond those of your vendor. You do NOT have to rebuild the SAN.

If you do have the Mylex variation, also post your firmware revision.

dolomiti

hi,
I am not sure if all is clear, just somne questions:

1) the model of the storage, 4000 is a series:
4300,4500,4700 ?

2) Are you using Storage Manager,
to view/manage the DS4x000 ?

3) you had a failed disk:
3.1- was marked failed (RED), or other ?
3.2- have you swapped failed disk with the spare?
3.3- in the same location?

4) Why do you need delete an array: is it the first, or the second?
To remove an Array (gray) probably you have to:
A) remove mapping (LUNS) from each logical volumes
B) reemove logical volumes,
C) then the array is free of resource and you can renove array
making free the disks.

bye
vic

vonda

ASKER

"If the option to delete it is grayed out, then you most likely have corrupted metadata." I think this is what the person I spoke with was referring to. Its good to hear it doesn't have to be rebuilt.

I'm not a SAN wizard by any means, but it seems if there was a LUN conflict, an action to enable the device would be required. Apparently not. My error was removing the mapping of the "corrupt" drive. It removed the mapping for our existing drive! Both were mapped, but the "corrupt" drive took the place of the already mapped drive in the host group before I did anything. After that, the only thing I could do was rebuild a new array with a different LUN. I tried to move the drive to another host group before removing the mapping, but it kept reconnecting to the one where our other LUN was before. What could have prevented this?

I don't know how to tell if we're Mylex based. The help/info in Storage Manager is very limited.

David

Nothing can prevent metadata corruption, as it is rare (but not so rare it doesn't happen). Biggest thing you can do is make sure you have the latest firmware and make sure that the config is rarely, if ever, degraded or in stress.

You are going to have to figure out what is inside of it. I can't do that for you. What does device manager report for make/model of the LUNs? What is part number, model number. Post screen shot of storage manager when you query controller information.

vonda

ASKER

dolomiti:

1. Its a 4800
2. Yes, Storage Manager
3. It was marked - Impending Failure and Failed at one point. There were no options for me to manage it though at any point. Normally you can fail a drive.
4. I want to delete the array that our SAN now thinks exists. We never created it. The logical drive now appears in Undefined Mappings. The physical disk drive is no longer in the system. No logical volume exists that I'm aware of. The array is a lighter green with a red slash through it.

vonda

ASKER

The drive was removed and the logical drive deleted. There isn't any information for it now except for the offiline array. Here is a screenshot of what I see: (left - mapping, right - array)
untitled.JPG

vonda

ASKER

LOL... woops, good for a laugh though!! Try this one...
untitled2.JPG

vonda

ASKER

FYI, that was a picture a friend sent to me. She does NOT work on SAN's, wear many clothes, or date outside of her trailer park often.

SOLUTION

David

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

stol12

I hade the exact same problem with an array after a power outage a couple of days ago. IBM told me to fail all drives of the array and then revive them. But that did not work, becasue one drive seemed to be misssing from the array.
I got a support technician to log in via webex to to run a bunch of commands on the commandline interface to rebuild the array and I got all data back.
So contact IBM and they will help you with a command line to delete the array.
I think that is the only way to do it. The GUI lacks alot of commands.