Solved

AX150 storage fault

Posted on 2010-09-21
19
2,737 Views
Last Modified: 2013-11-14
I have an AX150 that is giving me the following error for SP B:
Storage System          100-561-403         Faulted

SP A does not report this problem.  

There is an amber light on the front, but not on one of the drives, rather the general fault led.  I would assume that this is a hard drive failure, but the problem is that there is no amber light for either disk.

If it is a drive issue, how do you tell which one if there is no amber light?   If it is not a drive issue, would it be anything other than a system board going bad?

0
Comment
Question by:B1izzard
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 9
19 Comments
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33730849
The following items could be bad and should have an amber LED on it:
Power supplies
Standby Power Supply/mini UPS
Storage Processors


Have a look here: https://powerlink.emc.com/nsepn/webapps/btg548664833igtcuup4826/public/ax150/en_US/pubd-web/FC/hw/ax100_hw.htm
Also here: http://www.emc.com/microsites/clariion-support/ax150/support.esp?redirect=true



0
 

Author Comment

by:B1izzard
ID: 33731628
Thanks for the links.  On a side note, the amber light may have been triggered when I took out the      bay 3# drive.   I forgot that the first four drives were OS drives and shouldn't be moved.  I then proceeded to take out the fourth drive and put it in bay #4 to make it a hot spare.  After re-reading the manual, I removed the hot spare from bay #4 and put it back in bay #3.  If so, shouldn't the amber light clear itself once the array builds itself?

If not, and it damaged the storage software, how do you reinstall the OS?
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33731688
Oh, nooooooooooooooooooooooooooooooo!

You've double-faulted the Vault area, so you will no longer be able to enable write cache. Keep your fingers crossed and return the drives to their original homes and you might be really, really lucky and it'll come good. Otherwise, you'll need EMC's help.

The Vault is an hidden disk structure (it's a RAID 3 set spread over the first four drives) that's used for dumping write cache if there's a power failure, the contents of write cache are dumped to vault - the array doesn't rely on battery backup to protect write cache. When the array is restarted, it checks the vault and if there is data in it, it gets written out to the appropriate areas of disk. If the vault is damaged, you'll need to recreate it, and for that you'll need EMC.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33731694
Incidentally, the OS is on all four drives. SPA boots from drive 0 and 2, SPB boots from drives 1 and 3. There are also recovery images hidden on the first four drives. The recovery images for SPA are on drives 1 and 3 and the recovery images for SPA are on drives 0 and 2. It's possible to rebuild the OS drives from a single known good drive. If you need that, we'll run through it, but I doubt it'll be needed.
0
 

Author Comment

by:B1izzard
ID: 33731933
If you do have any information on rebuilding this please let me know.  

Is this typical for SANs to have the OS on some of the drives?  Seems kind of dangerous to me.
0
 

Author Comment

by:B1izzard
ID: 33731942
Besides, what happens if one of the first four drives fails?  Can that cause OS problems?
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33731973
Nope - that's why there are multiple mirrors of the operating system and multiple copies of the recovery information. There is also a hidden triple mirrored LUN that has all the configuration data on it.

The rebuild process may mean you lose data - if you've just removed drives three and four, then the OS partitions will already have been rebuilt by the array. No need for any further action - except to fix the vault partition and for that you'll need EMC.
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33731980
Just to underline that - the array will have already rebuilt the hidden OS partitions. You only need the rebuild process if you have an array with no working OS drives - and you aren't in that position (and I hope you never are  :-) ) - this only happens when someone mucks about with the drives in the array and re-orders all the vault drives. It's why EMC place a large yellow sticker across the drives saying "Leave me alone OR ELSE!" (or whatever it actually says).
0
 

Author Comment

by:B1izzard
ID: 33732057
Mine didn't have a sticker unfortunately.  

I'm guessing the fault is why I can't get SP B to work.  SP A has been working perfectly everytime, but I cannot get SP B to work properly.  I spent days trying everything to get it to appear, then finally it appeared briefly, then dissappeared again and I haven't seen it since.  

The LED's on the QLogic's appear solid green, the Navisphere Express shows that SP A and SP B are active and registered, but I can only get SP A to show in PowerPath.   What is your opinion on this?   Could this be related to the amber storage fault light?
0
 
LVL 30

Accepted Solution

by:
Duncan Meyers earned 500 total points
ID: 33732349
You may have a simple connectivity issue - fibre channel cables are extremely delicate and kack-handed handling will break them . The fact that SPA and SPB can see each other is encouraging, but the amber LED isn't good.

Just spotted something: you said: I would assume that this is a hard drive failure, but the problem is that there is no amber light for either disk.

What exactly do you mean by 'either disk'? there should be an absolute minimum of four drives in the array in locations 0 - 3. If there are not, I think we have a reason for the fault LED.
0
 

Author Comment

by:B1izzard
ID: 33734217
Sorry, there is no amber light for any disk.  Just the top left fault light.  I had tried 3 brand new cables, and 3 different HBA's for SP B with little success (just the one brief appearance in a weeks worth of attempts.  I've followed the manual carefully (minus the drive 4 debacle).  Even tried installing only one HBA, rebooting, connecting the server and verifying it appears in Windows, shut server down, connected SP B, rebooted, but no SP B.  

So the question is: why is there a fault light located top left, and top right?  Does the top left pertain to SP A, and top right pertain to SP B?
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33734395
The one on the right is the power LED, the one on the left is the fault LED: http://www.emc.com/microsites/clariion-support/ax150/pdf/hardware_overview.pdf

Have a look at the rear of the array. There is a Boot/Fault LED for each storage processor. The LED should be off on both SPs. See https://powerlink.emc.com/nsepn/webapps/btg548664833igtcuup4826/public/ax150/en_US/pubd-web/FC/hw/ax100_hw.htm for help locating the LEDs.  If the LED is on this indicates a problem on the SP.

Have you worked through clicking on Attention Required in Navisphere Express?
0
 

Author Comment

by:B1izzard
ID: 33736291
The attention required just shows 'There are faulted devices in this system.'  There were no fault lights for either SP.

Strange event: I had to move everything from my office to another room, and started it back up and both SP's are again showing.  There is nothing different with my configuration, but now they show.

So this leads to another question.  If the FC cables are not in 6' loops, will this cause these types of problems?  I just have it laying out behind the server and AX150.  
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33739714
That's a good thing - but it definitely points to dodgy cables. They really are very fragile, and it does sound as if they're damaged. The minimum bend radius for fibre optic cabling is (IIRC) 6 inches, so any tangles in the cable have probably already fractured the internal core - likewise if anyone's stepped on the cables or been a bit over-enthusiastic with the cable ties.
0
 

Author Comment

by:B1izzard
ID: 33740044
I have handled them very carefully, just not looped them up.  I was very careful however to not step on them or pinch them, but rather let them hang free.

I heard they were fragile, but didn't realize just laying them out carefully could cause this.  I do know that they were getting under the 6 inches, so that is probably it.  

So in this case I will probably do a little test for fun to see how much I can bend it before it dissappears from PowerVault.  Sounds like fun!

I will test things out and let you know how it turns out.  Thanks for the feedback.  
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33740165
If the cables came with the hardware from your customer, of course you've got no way of knowing how they've been handles in their previous life. I'm always highly suspicious of FC cables that I don't know the provenance of.
0
 

Author Comment

by:B1izzard
ID: 33741270
The cables were all new.  It's my first real SAN (have an RA4100 but it's so old it doesn't count), so this is fairly new to me.  All the experience in setting this up is now permanently engrained in my memory so I won't make these mistakes again.

I did speak directly with the company I bought this from and they said the fault light is more than likely related to the missing UPS.  Someday when I have the cash I will buy one, but for now this is just a test lab.  

I have had it running stable on both SP A and B all day, even after messing with disconnecting cables I couldn't break it.  That is until I selected 'Remove from config' from PowerPath.  I won't bother you with that question, but will post a new question on that.  

Thanks for your help as always!
0
 

Author Closing Comment

by:B1izzard
ID: 33741276
EE, why must I provide a reason for closing this question?  I just want it closed!
0
 
LVL 30

Expert Comment

by:Duncan Meyers
ID: 33741772
Thanks! Glad I could help.
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question