• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 797
  • Last Modified:

Can't delete a SAN array created by a corrupt drive

We received a replacement drive last week for a bad drive in our SAN.  The slot the drive was in wasn't configured yet.  Turns out the drive (refurbished obviously) had data still on it.  No big deal right?  

This logical drive that existed on the drive also happened to use the same LUN as one of our existing drives.  I kept getting errors when I tried to delete the bad drive.  When I tried to remove the mapping it took with it the mapping for our existing drive.  Fortunately that server wasn't online.  To meet a deadline, I recreated the previous array ...with a different LUN this time.  The drive is no longer there, but...  

Now (then too) I have no option to delete this array.  The option is grayed out.  I'm not sure what to do at this point.  A third party support representative said we would have to rebuild our entire SAN.  (Oh joy!!)  It contains several severs so this would be quite a "recovery" job for a single faulty drive from a supplier.  

I know it should be tested when possible, but I had never imagining it would corrupt a SAN configuration.  Has anyone else been through this?  Our support company has offered to compensate us.  It created quite a problem obviously, or else I wouldn't be here.  Not to mention the time it took to recreate a new array.

Oh, we're using IBM Storage Manager.

2 Solutions
Your HBA most likely also has a SAN mapping feature  (it certainly does if it is Qlogic or Emulex, or JNI).  Download the addon-java program from the HBA vendor's web site, and you can just remap the WWN that way.  You will probably also want to enable persistent mapping so the WWN will always map to the selected LUN even if you move it on the SAN.

As for remapping the SAN within the RAID controller, most have that feature as well.  So do switches, just use zoning.

SAN mapping, masking, and allowing more than one host to have read/write access to a device is a good thing, if used properly.  This makes clustering and high availability possible.  So don't blame the vendors for corrupting the device.  Blame the person who made the decision not to properly train the person that did the configuration to learn about such things as san mapping, lun masking, and persistence.
P.S. there are several controller options for the DS4000.  If you are using the Mylex-based engine, then you can configure the RAID to tie a specific LUN# to any specific WWN.  If the option to delete it is grayed out, then you most likely have corrupted metadata.  This *CAN* be fixed, but you need a person with skills beyond those of your vendor.   You do NOT have to rebuild the SAN.

If you do have the Mylex variation, also post your firmware revision.
I am not sure if all is clear, just somne questions:

1) the model of the storage, 4000 is a series:
4300,4500,4700 ?

2) Are you using Storage Manager,
to view/manage the DS4x000 ?

3) you had a failed disk:
3.1- was marked failed (RED), or other ?
3.2- have you swapped failed disk with the spare?
3.3- in the same location?

4) Why do you need delete an array: is it the first, or the second?
To remove an Array (gray) probably you have to:
A) remove mapping (LUNS) from each logical volumes
B) reemove logical volumes,
C) then the array is free of resource and you can renove array
making free the disks.

Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

vondaAuthor Commented:
"If the option to delete it is grayed out, then you most likely have corrupted metadata."  I think this is what the person I spoke with was referring to.  Its good to hear it doesn't have to be rebuilt.  

I'm not a SAN wizard by any means, but it seems if there was a LUN conflict, an action to enable the device would be required.  Apparently not.  My error was removing the mapping of the "corrupt" drive.  It removed the mapping for our existing drive!  Both were mapped, but the "corrupt" drive took the place of the already mapped drive in the host group before I did anything.  After that, the only thing I could do was rebuild a new array with a different LUN.  I tried to move the drive to another host group before removing the mapping, but it kept reconnecting to the one where our other LUN was before.  What could have prevented this?

I don't know how to tell if we're Mylex based.  The help/info in Storage Manager is very limited.  
Nothing can prevent metadata corruption, as it is rare (but not so rare it doesn't happen). Biggest thing you can do is make sure you have the latest firmware and make sure that the config is rarely, if ever, degraded or in stress.

You are going to have to figure out what is inside of it. I can't do that for you.  What does device manager report for make/model of the LUNs?  What is part number, model number.  Post screen shot of storage manager when you query controller information.  

vondaAuthor Commented:

1. Its a 4800
2. Yes, Storage Manager
3. It was marked - Impending Failure and Failed at one point.  There were no options for me to manage it though at any point.  Normally you can fail a drive.
4. I want to delete the array that our SAN now thinks exists.  We never created it.  The logical drive now appears in Undefined Mappings.  The physical disk drive is no longer in the system.  No logical volume exists that I'm aware of.  The array is a lighter green with a red slash through it.
vondaAuthor Commented:
The drive was removed and the logical drive deleted.  There isn't any information for it now except for the offiline array.  Here is a screenshot of what I see:  (left - mapping, right - array)
vondaAuthor Commented:
LOL... woops, good for a laugh though!!  Try this one...
vondaAuthor Commented:
FYI, that was a picture a friend sent to me.  She does NOT work on SAN's, wear many clothes, or date outside of her trailer park often.  
Sorry, can't help you manually correcting metadata on the 4800, I don't know how to do it for that particular model.  However, you should be able to get away with
1. Full backup
2. document configuration info (drive order, stripe size, disks used, starting/ending blocks, world-wide name, mapping)
3. clear config
4. rebuild config same way as in #2, less the busted array, but do NOT initialize the arrays.  This rebuilds a clean set of correct metadata and as long as you don't initialize the array's, it wont zero the LUNs out.

Note Step #1 is important.  This is a lot easier than rebuilding the SAN and in other controllers takes about 10 mins from beginning to end (except for the backup step).  Beats the heck out of rebuilding the entire SAN.  
I hade the exact same problem with an array after a power outage a couple of days ago. IBM told me to fail all drives of the array and then revive them. But that did not work, becasue one drive seemed to be misssing from the array.
I got a support technician to log in via webex to to run a bunch of commands on the commandline interface to rebuild the array and I got all data back.
So contact IBM and they will help you with a command line to delete the array.
I think that is the only way to do it. The GUI lacks alot of commands.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now