Solved

AX150 storage fault

Posted on 2010-09-21
19
2,694 Views
Last Modified: 2013-11-14
I have an AX150 that is giving me the following error for SP B:
Storage System          100-561-403         Faulted

SP A does not report this problem.  

There is an amber light on the front, but not on one of the drives, rather the general fault led.  I would assume that this is a hard drive failure, but the problem is that there is no amber light for either disk.

If it is a drive issue, how do you tell which one if there is no amber light?   If it is not a drive issue, would it be anything other than a system board going bad?

0
Comment
Question by:B1izzard
  • 10
  • 9
19 Comments
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
The following items could be bad and should have an amber LED on it:
Power supplies
Standby Power Supply/mini UPS
Storage Processors


Have a look here: https://powerlink.emc.com/nsepn/webapps/btg548664833igtcuup4826/public/ax150/en_US/pubd-web/FC/hw/ax100_hw.htm
Also here: http://www.emc.com/microsites/clariion-support/ax150/support.esp?redirect=true



0
 

Author Comment

by:B1izzard
Comment Utility
Thanks for the links.  On a side note, the amber light may have been triggered when I took out the      bay 3# drive.   I forgot that the first four drives were OS drives and shouldn't be moved.  I then proceeded to take out the fourth drive and put it in bay #4 to make it a hot spare.  After re-reading the manual, I removed the hot spare from bay #4 and put it back in bay #3.  If so, shouldn't the amber light clear itself once the array builds itself?

If not, and it damaged the storage software, how do you reinstall the OS?
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
Oh, nooooooooooooooooooooooooooooooo!

You've double-faulted the Vault area, so you will no longer be able to enable write cache. Keep your fingers crossed and return the drives to their original homes and you might be really, really lucky and it'll come good. Otherwise, you'll need EMC's help.

The Vault is an hidden disk structure (it's a RAID 3 set spread over the first four drives) that's used for dumping write cache if there's a power failure, the contents of write cache are dumped to vault - the array doesn't rely on battery backup to protect write cache. When the array is restarted, it checks the vault and if there is data in it, it gets written out to the appropriate areas of disk. If the vault is damaged, you'll need to recreate it, and for that you'll need EMC.
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
Incidentally, the OS is on all four drives. SPA boots from drive 0 and 2, SPB boots from drives 1 and 3. There are also recovery images hidden on the first four drives. The recovery images for SPA are on drives 1 and 3 and the recovery images for SPA are on drives 0 and 2. It's possible to rebuild the OS drives from a single known good drive. If you need that, we'll run through it, but I doubt it'll be needed.
0
 

Author Comment

by:B1izzard
Comment Utility
If you do have any information on rebuilding this please let me know.  

Is this typical for SANs to have the OS on some of the drives?  Seems kind of dangerous to me.
0
 

Author Comment

by:B1izzard
Comment Utility
Besides, what happens if one of the first four drives fails?  Can that cause OS problems?
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
Nope - that's why there are multiple mirrors of the operating system and multiple copies of the recovery information. There is also a hidden triple mirrored LUN that has all the configuration data on it.

The rebuild process may mean you lose data - if you've just removed drives three and four, then the OS partitions will already have been rebuilt by the array. No need for any further action - except to fix the vault partition and for that you'll need EMC.
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
Just to underline that - the array will have already rebuilt the hidden OS partitions. You only need the rebuild process if you have an array with no working OS drives - and you aren't in that position (and I hope you never are  :-) ) - this only happens when someone mucks about with the drives in the array and re-orders all the vault drives. It's why EMC place a large yellow sticker across the drives saying "Leave me alone OR ELSE!" (or whatever it actually says).
0
 

Author Comment

by:B1izzard
Comment Utility
Mine didn't have a sticker unfortunately.  

I'm guessing the fault is why I can't get SP B to work.  SP A has been working perfectly everytime, but I cannot get SP B to work properly.  I spent days trying everything to get it to appear, then finally it appeared briefly, then dissappeared again and I haven't seen it since.  

The LED's on the QLogic's appear solid green, the Navisphere Express shows that SP A and SP B are active and registered, but I can only get SP A to show in PowerPath.   What is your opinion on this?   Could this be related to the amber storage fault light?
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 30

Accepted Solution

by:
Duncan Meyers earned 500 total points
Comment Utility
You may have a simple connectivity issue - fibre channel cables are extremely delicate and kack-handed handling will break them . The fact that SPA and SPB can see each other is encouraging, but the amber LED isn't good.

Just spotted something: you said: I would assume that this is a hard drive failure, but the problem is that there is no amber light for either disk.

What exactly do you mean by 'either disk'? there should be an absolute minimum of four drives in the array in locations 0 - 3. If there are not, I think we have a reason for the fault LED.
0
 

Author Comment

by:B1izzard
Comment Utility
Sorry, there is no amber light for any disk.  Just the top left fault light.  I had tried 3 brand new cables, and 3 different HBA's for SP B with little success (just the one brief appearance in a weeks worth of attempts.  I've followed the manual carefully (minus the drive 4 debacle).  Even tried installing only one HBA, rebooting, connecting the server and verifying it appears in Windows, shut server down, connected SP B, rebooted, but no SP B.  

So the question is: why is there a fault light located top left, and top right?  Does the top left pertain to SP A, and top right pertain to SP B?
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
The one on the right is the power LED, the one on the left is the fault LED: http://www.emc.com/microsites/clariion-support/ax150/pdf/hardware_overview.pdf

Have a look at the rear of the array. There is a Boot/Fault LED for each storage processor. The LED should be off on both SPs. See https://powerlink.emc.com/nsepn/webapps/btg548664833igtcuup4826/public/ax150/en_US/pubd-web/FC/hw/ax100_hw.htm for help locating the LEDs.  If the LED is on this indicates a problem on the SP.

Have you worked through clicking on Attention Required in Navisphere Express?
0
 

Author Comment

by:B1izzard
Comment Utility
The attention required just shows 'There are faulted devices in this system.'  There were no fault lights for either SP.

Strange event: I had to move everything from my office to another room, and started it back up and both SP's are again showing.  There is nothing different with my configuration, but now they show.

So this leads to another question.  If the FC cables are not in 6' loops, will this cause these types of problems?  I just have it laying out behind the server and AX150.  
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
That's a good thing - but it definitely points to dodgy cables. They really are very fragile, and it does sound as if they're damaged. The minimum bend radius for fibre optic cabling is (IIRC) 6 inches, so any tangles in the cable have probably already fractured the internal core - likewise if anyone's stepped on the cables or been a bit over-enthusiastic with the cable ties.
0
 

Author Comment

by:B1izzard
Comment Utility
I have handled them very carefully, just not looped them up.  I was very careful however to not step on them or pinch them, but rather let them hang free.

I heard they were fragile, but didn't realize just laying them out carefully could cause this.  I do know that they were getting under the 6 inches, so that is probably it.  

So in this case I will probably do a little test for fun to see how much I can bend it before it dissappears from PowerVault.  Sounds like fun!

I will test things out and let you know how it turns out.  Thanks for the feedback.  
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
If the cables came with the hardware from your customer, of course you've got no way of knowing how they've been handles in their previous life. I'm always highly suspicious of FC cables that I don't know the provenance of.
0
 

Author Comment

by:B1izzard
Comment Utility
The cables were all new.  It's my first real SAN (have an RA4100 but it's so old it doesn't count), so this is fairly new to me.  All the experience in setting this up is now permanently engrained in my memory so I won't make these mistakes again.

I did speak directly with the company I bought this from and they said the fault light is more than likely related to the missing UPS.  Someday when I have the cash I will buy one, but for now this is just a test lab.  

I have had it running stable on both SP A and B all day, even after messing with disconnecting cables I couldn't break it.  That is until I selected 'Remove from config' from PowerPath.  I won't bother you with that question, but will post a new question on that.  

Thanks for your help as always!
0
 

Author Closing Comment

by:B1izzard
Comment Utility
EE, why must I provide a reason for closing this question?  I just want it closed!
0
 
LVL 30

Expert Comment

by:Duncan Meyers
Comment Utility
Thanks! Glad I could help.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

If you have a USB Drive that is not recognized by Windows the problem is usually that you have too many network drives or other drives that occupy all the drive letters D: E: or F: which is the normal drive letter of a usb drive. The way to correct …
We all have limited time to study long and complicated information about RAID theories, but you may be interested as to how RAID 5 works. We made it simple for you by providing the shortest and easiest explanation ever.   First we need to remind …
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now