Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Can't delete a SAN array created by a corrupt drive

Posted on 2010-01-10
Medium Priority
Last Modified: 2013-11-14
We received a replacement drive last week for a bad drive in our SAN.  The slot the drive was in wasn't configured yet.  Turns out the drive (refurbished obviously) had data still on it.  No big deal right?  

This logical drive that existed on the drive also happened to use the same LUN as one of our existing drives.  I kept getting errors when I tried to delete the bad drive.  When I tried to remove the mapping it took with it the mapping for our existing drive.  Fortunately that server wasn't online.  To meet a deadline, I recreated the previous array ...with a different LUN this time.  The drive is no longer there, but...  

Now (then too) I have no option to delete this array.  The option is grayed out.  I'm not sure what to do at this point.  A third party support representative said we would have to rebuild our entire SAN.  (Oh joy!!)  It contains several severs so this would be quite a "recovery" job for a single faulty drive from a supplier.  

I know it should be tested when possible, but I had never imagining it would corrupt a SAN configuration.  Has anyone else been through this?  Our support company has offered to compensate us.  It created quite a problem obviously, or else I wouldn't be here.  Not to mention the time it took to recreate a new array.

Oh, we're using IBM Storage Manager.

Question by:vonda
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 47

Accepted Solution

David earned 2000 total points
ID: 26285098
Your HBA most likely also has a SAN mapping feature  (it certainly does if it is Qlogic or Emulex, or JNI).  Download the addon-java program from the HBA vendor's web site, and you can just remap the WWN that way.  You will probably also want to enable persistent mapping so the WWN will always map to the selected LUN even if you move it on the SAN.

As for remapping the SAN within the RAID controller, most have that feature as well.  So do switches, just use zoning.

SAN mapping, masking, and allowing more than one host to have read/write access to a device is a good thing, if used properly.  This makes clustering and high availability possible.  So don't blame the vendors for corrupting the device.  Blame the person who made the decision not to properly train the person that did the configuration to learn about such things as san mapping, lun masking, and persistence.
LVL 47

Expert Comment

ID: 26285143
P.S. there are several controller options for the DS4000.  If you are using the Mylex-based engine, then you can configure the RAID to tie a specific LUN# to any specific WWN.  If the option to delete it is grayed out, then you most likely have corrupted metadata.  This *CAN* be fixed, but you need a person with skills beyond those of your vendor.   You do NOT have to rebuild the SAN.

If you do have the Mylex variation, also post your firmware revision.

Expert Comment

ID: 26286800
I am not sure if all is clear, just somne questions:

1) the model of the storage, 4000 is a series:
4300,4500,4700 ?

2) Are you using Storage Manager,
to view/manage the DS4x000 ?

3) you had a failed disk:
3.1- was marked failed (RED), or other ?
3.2- have you swapped failed disk with the spare?
3.3- in the same location?

4) Why do you need delete an array: is it the first, or the second?
To remove an Array (gray) probably you have to:
A) remove mapping (LUNS) from each logical volumes
B) reemove logical volumes,
C) then the array is free of resource and you can renove array
making free the disks.

Survive A High-Traffic Event with Percona

Your application or website rely on your database to deliver information about products and services to your customers. You can’t afford to have your database lose performance, lose availability or become unresponsive – even for just a few minutes.


Author Comment

ID: 26287187
"If the option to delete it is grayed out, then you most likely have corrupted metadata."  I think this is what the person I spoke with was referring to.  Its good to hear it doesn't have to be rebuilt.  

I'm not a SAN wizard by any means, but it seems if there was a LUN conflict, an action to enable the device would be required.  Apparently not.  My error was removing the mapping of the "corrupt" drive.  It removed the mapping for our existing drive!  Both were mapped, but the "corrupt" drive took the place of the already mapped drive in the host group before I did anything.  After that, the only thing I could do was rebuild a new array with a different LUN.  I tried to move the drive to another host group before removing the mapping, but it kept reconnecting to the one where our other LUN was before.  What could have prevented this?

I don't know how to tell if we're Mylex based.  The help/info in Storage Manager is very limited.  
LVL 47

Expert Comment

ID: 26287265
Nothing can prevent metadata corruption, as it is rare (but not so rare it doesn't happen). Biggest thing you can do is make sure you have the latest firmware and make sure that the config is rarely, if ever, degraded or in stress.

You are going to have to figure out what is inside of it. I can't do that for you.  What does device manager report for make/model of the LUNs?  What is part number, model number.  Post screen shot of storage manager when you query controller information.  


Author Comment

ID: 26287390

1. Its a 4800
2. Yes, Storage Manager
3. It was marked - Impending Failure and Failed at one point.  There were no options for me to manage it though at any point.  Normally you can fail a drive.
4. I want to delete the array that our SAN now thinks exists.  We never created it.  The logical drive now appears in Undefined Mappings.  The physical disk drive is no longer in the system.  No logical volume exists that I'm aware of.  The array is a lighter green with a red slash through it.

Author Comment

ID: 26288267
The drive was removed and the logical drive deleted.  There isn't any information for it now except for the offiline array.  Here is a screenshot of what I see:  (left - mapping, right - array)

Author Comment

ID: 26288286
LOL... woops, good for a laugh though!!  Try this one...

Author Comment

ID: 26288338
FYI, that was a picture a friend sent to me.  She does NOT work on SAN's, wear many clothes, or date outside of her trailer park often.  
LVL 47

Assisted Solution

David earned 2000 total points
ID: 26288425
Sorry, can't help you manually correcting metadata on the 4800, I don't know how to do it for that particular model.  However, you should be able to get away with
1. Full backup
2. document configuration info (drive order, stripe size, disks used, starting/ending blocks, world-wide name, mapping)
3. clear config
4. rebuild config same way as in #2, less the busted array, but do NOT initialize the arrays.  This rebuilds a clean set of correct metadata and as long as you don't initialize the array's, it wont zero the LUNs out.

Note Step #1 is important.  This is a lot easier than rebuilding the SAN and in other controllers takes about 10 mins from beginning to end (except for the backup step).  Beats the heck out of rebuilding the entire SAN.  

Expert Comment

ID: 26384759
I hade the exact same problem with an array after a power outage a couple of days ago. IBM told me to fail all drives of the array and then revive them. But that did not work, becasue one drive seemed to be misssing from the array.
I got a support technician to log in via webex to to run a bunch of commands on the commandline interface to rebuild the array and I got all data back.
So contact IBM and they will help you with a command line to delete the array.
I think that is the only way to do it. The GUI lacks alot of commands.

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The business world is becoming increasingly integrated with tech. It’s not just for a select few anymore — but what about if you have a small business? It may be easier than you think to integrate technology into your small business, and it’s likely…
Want to know how to use Exchange Server Eseutil command? Go through this article as it gives you the know-how.
In this Micro Tutorial viewers will learn how to use Windows Server Backup to create full image of their system. Tutorial shows how to install Windows Server Backup Feature on Windows 2012R2 and how to configure scheduled Bare Metal Recovery backup.…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question