We currently have an issue with iSCSI connectivity on a setup we have inherited from another IT provider.
To summerise, this is a Hyper-V 2K8 R2 cluster, which is using CSV's through ISCSI stored on a HP MSA 2324i iSCSI SAN. Each host has 2 NIC's for ISCSI, and they are on there own subnet. They are split between two switches for redundancy - nothing has changed prior to the fault.
Friday evening, all of the VM's went offline and the storage was marked as failed in the cluster. On investigation I pinged the iSCSI interfaces on the SAN from the hyper-V hosts. The ping times were either 1300ms+ or they were dropped, on all controller interfaces, from all hosts.
After several hours the storage came back online, the cluster storage could be brought back online and the ISCSI initiators on the hosts went from "reconnecting" to "connected". Nothing had changed to bring it back online.
Several hours later the same issue occurs. After 20+ hours it comes back up again, for one hour then dropped.
I have tried the following:
- Removed the physical cables and tried one at a time into each controller
- restarted the storage and management controllers
- restarted the hosts
- changed the IP's on the host interfaces on the SAN
- enabled loop protection on the switches
Given that all of the hosts are disconnected I am certain the SAN is the issue. In the logs of the SAN you can see the port connections going up and down, however everything is reporting as healthy.
Colleague is on the phone to HP but thought I would throw it out to EE in case anyone has had a similar issue they managed to fix.....