Solved

HP MSA1000 - Fibre Channel Controller Replacement - WebTools License Key Missing

Posted on 2010-08-19
16
2,089 Views
Last Modified: 2013-11-14
Our company is using a HP MSA1000 with two fibre channel controllers in an Active / Active configuration.

One of the controllers went bad and since we have a warranty with HP, an engineer came out and replaced the defective controller. The replaced switch is showing up in the HP System Management Homepage as okay however, it does not show up in the Switch Explorer web interface. When I try to access the controller by its IP, I get the following message:

"WebTools License Key Missing on Switch SGM06019L2-switch1"

I asked the HP engineer about this when he was onsite and he said it will take about a day for the settings from the other controller to "replicate" to the replaced controller. He also mentioned that the exisiting key is just a default key. This did not sound accurate to me but since I do not have any experience with Fibre Channel and I have no spares to test with, I took his word for it. Needless to say, the problem still persists and I am worried that even though the new controller appears to be working, it is not configured correctly.

I have contacted HP for a resolution but I am posting here for a second opinion. I have already tried using the existing Web license in the working controller but I received a message saying that the key is not valid. I was also shown an HP site called the "HP Authorization center" (webkey.external.hp.com) where new licenses can be generated but when I tried entering my serials, I received a message stating that none of the serials were found in the database.

I also noticed that the Fabric OS versions are different - the replaced controller has a newer version.

This is a production SAN attached to ESX Hosts so I do not have the luxury of trial and error. I saw a command in the Fabric OS to reboot the controller but I am hesitant to try it.

How can I fix the licensing issue and ensure that the controller is configured correctly?

 New controller status in HP Systems Insight Manager Working controller status in HP Systems Insight Manager licenseShow and switchShow command output on new controller.   licenseShow and switchShow command output on working controller. Error message shown when accessing the web interface.
0
Comment
Question by:SSAKUSEISHA
  • 7
  • 6
  • 3
16 Comments
 
LVL 118
ID: 33481364
HP Engineer has done a poor job here, he should have checked what Fabric OS was on the switch before and ensured that the same was on the replacement, as well as licensed keys.

The keys are static per switch, and they do not replicate. (what a croc of ****).

you really need to get this sorted because ESX and HP MSA1000 can be tricky stuff to get working correctly.
0
 
LVL 118
ID: 33481375
do you have a backup of the configuration of the old controller before it went bad?
0
 
LVL 118
ID: 33481387
the problem with HP is they have very few engineers left that work on MSA 1000. Check the VMware ESX HCL for which Fabric OS you should be on, Fabric OS may be okay, as long as firmware in the SAN (7/7) is the same, but you really should have the same firmwares on both fabrics.
0
 

Author Comment

by:SSAKUSEISHA
ID: 33481678
hanccocka,

Thank you for your help.

Yeah, that is what I thought too but they are supposed to be the experts...I have a feeling that I will be the expert when this is sorted out. :)

I am not sure about the backup - I tried to look for it but haven't found it yet. The previous IT team was very, very thorough so hopefully it is somewhere.

I will check with the VMware HCL too.

In the meantime, is there anything that you can think of that definitely should not be done in this situation? (Not obvious things like pulling out the controller that is definitely working! :) )

Thank you again for your time and help.

0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 200 total points
ID: 33483261
This doesn't make much sense, the switch is completely seperate from the controller, It's housed at the back of the MSA1000 but only for power and for the single fibre channel connection from sitch to controller. Functionally it's the same as having a seperate external switch. Changing a controller at the front can have no effect on the licenses on the switches. Are you sure it was the controller rather than switch he changed?

0
 
LVL 118

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE) earned 300 total points
ID: 33483273
I would talk to your Manager if you have one about the HP mess-up, just to explain the potentional risk of Switch Fabric OS mismatch, it's not your fault etc they should have replaced like for liike. Different Switch Fabrics maybe okay, but FC configurations with ESX can often cause downtime. Get back in tocuh with HP, and tell them to get the engineer back to downupgrade the firmware and apply the license. You may want to schedule some downtime if possible.
0
 

Author Comment

by:SSAKUSEISHA
ID: 33490488
Thanks everyone.

It was not the controller, but the switch that was replaced. I watched the engineer remove it from the rear of the MSA 1000.

I have spoken to HP but they are not sure - they are investigating. By the way, I am in Japan so it is HP Japan that I am working with.

I also contacted the engineers that originally designed the infrastructure. They are professional storage engineers so I am sure that they will be able to help me.

I will keep everyone updated!
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33496123
As an aside I'm a bit concerned about the "segmented - domain overlap" state of the E-ports, that may be due to not having full fabric license which is an additional part number/cost, but it's more likely likely that the domain ID wasn't changed when the switch was replaced.


Do you have the serial number of the original switch? It's that S/N HP should look up in the database because when they replace a switch they have to get new licenses for the replacement (which probably isn't in their database as they haven't sold it to anyone) by looking up the old s/n in their database.
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 118
ID: 33496519
In my experience with working with DEC, Compaq and now HP Engineers 30+ years, a Good HP Engineer will record configurations, versions of flash code, firmware drivers etc before he attempts to repair and replace anything, and advise you of his Repair Plan, and what you should have when you leave is what you had to start with, and also fixed. My understanding also now, is unless you give them specific Instructions and Agree with them,most Engineera are not "warrantied" to change parts that could leave to further downtime or data loss, unless they agree it with 2nd/3rd Technical Support in HP India (in our case in the UK). This is specific to SAN based work, on MA, EVA, MSA FC and iSCSI in the UK, having dealt with them alot on failed EVA, MSA FC/iSCSI SANs. (lately!).
0
 

Author Comment

by:SSAKUSEISHA
ID: 33498496
Just got off the phone with HP today. They will be coming back to replace the switch again. I am sure that they got a hold of the old switch and pulled the config off of it. Hopefully I will get a proper engineer this time.

I will update everyone with the result.
0
 

Author Comment

by:SSAKUSEISHA
ID: 33517530
The same HP engineer came and replaced the "replaced" FC switch with another FC switch that had all of the licensing installed.

It seemed to work; HP Systems Insight Manager showed that the switch was up and I could see it in the Switch Explorer web interface. Also, under the LUN properties of the ESX hosts, the "dead" status for the Path Status changed to "On".

However, once the replaced switch came online, a bunch of "lost connection" alerts were generated from HPSIM. VMs on the ESX Host lost connectivity for about 5 seconds and I am assuming this is DRS or VMotion in action. Most of the VMs were fine but three VMs had problems and exhibited different behaviors:

VM1 - Generated these errors but recovered automatically.

Event Type:      Error
Event Source:      Disk
Event Category:      None
Event ID:      11
Date:            2010/08/24
Time:            22:09:24
User:            N/A
Computer:      xxxxx
Description:
The driver detected a controller error on \Device\Harddisk0

Event Type:      Error
Event Source:      vmscsi
Event Category:      None
Event ID:      15
Date:            2010/08/24
Time:            22:09:24
User:            N/A
Computer:      xxxxx
Description:
The device, \Device\Scsi\vmscsi1, is not ready for access yet.

VM2 - Had the same errors as above but did not recover. Had to reboot the VM.

vent Type:      Error
Event Source:      Disk
Event Category:      None
Event ID:      11
Date:            08/24/2010
Time:            10:09:18 PM
User:            N/A
Computer:      ttttt
Description:
The driver detected a controller error on \Device\Harddisk0.

Event Type:      Error
Event Source:      symmpi
Event Category:      None
Event ID:      15
Date:            08/24/2010
Time:            10:09:18 PM
User:            N/A
Computer:      ttttt
Description:
The device, \Device\Scsi\symmpi1, is not ready for access yet.

VM3 - Generated no errors but rebooted all of a sudden and got stuck at the BIOS screen. The VM worked normally after restarting it.

Is this typical? I thought the whole purpose of dual SAN switches was to avoid downtime...

This might be out of scope but could you show me how to determine what the exact cause of the VM errors above was? The "errShow" command at the CLI did not give my any useful information.

0
 

Author Comment

by:SSAKUSEISHA
ID: 33518101
I looked up how to troubleshoot the Event ID 11 and 15 errors and tried the following on one of the ESX hosts:

1. Looked for error messages in the /var/log/vmkwarning log. Found this:

Aug 24 22:14:15 MY_ESX_HOST03 vmkernel: 930:21:32:49.137 cpu3:1722)WARNING: Swap: vm 1721: 7515: Swap sync read failed: status=195887167, retrying...
.
.

Aug 24 22:14:05 MY_ESX_HOST03 vmkernel: 930:21:32:39.410 cpu0:1055)WARNING: SCSI: 5306: vml.0200010000600805f3001a3c70a0e14b09724e00054d534120564f: Too many failed retries 33 (32),  Returning I/O failure. 0x16 1/0x0 0x0 0x0 0x0

2. Ran the command "vm-support -x" and found the vm ID 1721 listed above:
VMware ESX Server Support Script 1.29


Available worlds to debug:
VMware ESX Server Support Script 1.29


Available worlds to debug:

vmid=1425       AAAAA
vmid=1490       BBBBB
vmid=1503       CCCCC
vmid=1673       DDDDD
vmid=1695       EEEEE
vmid=1721       The VM that had problems after the switch replacement

3. Confirmed the error with the command "ls /proc/vmware/vm/1721/disk"

vml.0200010000600805f3001a3c70a0e14b09724e00054d534120564f

If this is correct, I have just basically confirmed what I already know - the FC switch replacement caused this VM to have disk errors. Is there anything else I can look at?
0
 
LVL 118
ID: 33518691
I believe the errors you have encontered is the subject of another question.I the original question has been answer, I would accept the solution and allocate points. Then open another question relating to the above SAN issue.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 33518834
Would need the same switchshow screenshot, hopefully without the WWNs of the E ports crossed out.
0
 

Author Closing Comment

by:SSAKUSEISHA
ID: 33520712
Thank you for your assistance.
0
 
LVL 118
ID: 33520734
can you post the URL/Question Number of the New Issue?
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

The 6120xp switches seem to have a bug when you create a fiber port channel when you have a UCS fabric interconnects talking to them.  If you follow the Cisco guide for the UCS, the FC Port channel will never come up and it will say that there are n…
Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now