asked on

vSphere can't connect to NetApp storage?

A remote site had a TCF wide power outage today that bricked NetApp B. But we are still able to ping NetApp A from the ESX hosts. However, after removing the bad Nodes IP from the dynamic discovery tab and re-scanning HBA or even rebooting, we still can't connect or see any datastores.

Nothing has changed with NetApp besides the hard cycle bricking it's failover partner. The good node sucessfully took ownership of the volumes and luns. Their online and still mapped to the iGroup yada yada. Additionally, when I put the good nodes target IP back into the dynamic discovery, it doesn't automatically populate the static fields. No we're not using CHAP.

Thanks in advance

Paul Solovyovsky

have you performed a vmkping command to the traget IP? Are you able to get into the management side of the Netapp?

snyderkv

ASKER

Yes connectivity is good as I pinged both ways (regular ping from esxi host to the storage targets)

When we do a nework interface show we get all the interfaces up/up and we can connect to the Oncommand cluster IP. It's odd that we don't see any interfaces in the network interfaces screen? but we see them when running the command at the command line and are pingable.

Paul Solovyovsky

Perhaps the failover didn't failover the interface properly or the switch configuration is not the same. I would rescan the adapters and datastores. I imagine you're running iSCSI, if so check your initiator groups just in case, I haven't see too many issues with cluster mode but 7 mode was wonky every once in a while. If the LUNs are showing online on the Netapp I'd reboot one of the hosts and try again, this may be session issue, check Netapp for stale sessions.

Display iSCSI sessions
Availability: This command is available to cluster and Vserver administrators at the admin privilege level.

Description

This command displays iSCSI session information. If you do not specify the target session ID (TSIH), the command displays all session information for the specified Vserver. If a Vserver is not specified, the command displays all session information in the cluster. Use the vserver iscsi connection show command to display connection information. Use the vserver iscsi session parameter show command to show the parameters used when creating the session. You can use session information for troubleshooting performance problems.
An iSCSI session can have one or multiple connections. Typically a session has at least one connection.
Most of the parameters are read-only. However, some parameters can be modified with the vserver iscsi modify command.

http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-cmpr-970%2Fvserver__iscsi__session__show.html

snyderkv

ASKER

Thanks Paul but basic steps have been tried. This has been a 30 hour outage so far. Trust me reboots have taken place. iqn's haven't changed. The Filer is corrupt. cluster show shows healthy but performing any commands says it's unhealthy and can't perform any tasks. A PCI card had to be reseated during a power outage so this may be why our Network Interfaces in the GUI is not visible yet everythign is pingable. I created another post on creating a new SVM and migrating data to it but if you happen to know how to do that let me know.

Paul Solovyovsky

Sorry to hear that, been through plenty of those calls. I imagine you have Netapp support on the line. Is there any way to get a new controller since the OS is separated from the LUN. If there is corruption of NVRAM or controller itself the data should still be ok since it should be able to scrub WAFL and recreate the metadata if needed. Just a thought..

snyderkv

ASKER

We found the issue.

Ontap 9.0 has a bug that stops it from functioning correctly when one node is in a giveback state for a long period of time during a power outage. We had to rebuild the bad node in order to fix it. They sent me the article that determined we had the bug.

ASKER CERTIFIED SOLUTION

snyderkv

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Paul Solovyovsky

That is strange, typically takeover/giveback is a fairly straightforward process.

snyderkv

ASKER

That's why it's a bug. It. One node must have been down for a long period without us noticing then the power outage strikes then the bug in that order I think. So the surviving node never fully recovers.