Brian B
asked on
No Connection Light on Switch
Hi everyone. Strange VMware problem we are having here and looking for idea.
vSAN running two hosts and a witness. For the vSAN portion, each node is connected to a Cisco 3850 via a pair of twinax cables. On one node there are no lights on the switch where the twinax connect. This was working before. I have tried the cables elsewhere and they work. I have tried other ports on the 3850 and it doesn't work there either. Dell just replaced the network card and it didn't help. On the switch, "sh int status" shows the twinax is plugged in, but not connected. Same for the server.
VMware shows nothing unusual other than the connection is down.
Again, this was working. Problem is only one node. The other one is fine as is the witness.
Anything else?
vSAN running two hosts and a witness. For the vSAN portion, each node is connected to a Cisco 3850 via a pair of twinax cables. On one node there are no lights on the switch where the twinax connect. This was working before. I have tried the cables elsewhere and they work. I have tried other ports on the 3850 and it doesn't work there either. Dell just replaced the network card and it didn't help. On the switch, "sh int status" shows the twinax is plugged in, but not connected. Same for the server.
VMware shows nothing unusual other than the connection is down.
Again, this was working. Problem is only one node. The other one is fine as is the witness.
Anything else?
ASKER
Are ALL the hardware components on the HCL, and firmware updated ? I would try replacement switch, nics and cables
Yes, all on the HCL. Firmware, yes. I actually had to update the firmware in order to get the switch virtual stacking working on these 3850s. NIC was just replaced, cables I confirmed work elsewhere. Replacement switch? We had enough trouble getting the two that we have. I guess I could swap the switches. Since they are virtually stacked and everything plugs into the same ports on both switches, it *should* be plug and play, but I'll have to schedule an outage.
Going to try a couple more cable/port combinations first though.
Any other ideas still welcome!
Not a solution, and not sure why with ours, if we unplug and re-plug the cable in, sometimes, the LINK is established!
ASKER
So a couple of more items we checked...
Plug both ends of the twinax into the server port. That produced green connection lights. Therefore the NIC is most likely good.
Plug the twinax in between an unused port on the 2960 and the 3850. That didn't work. So probably is most likely the 3850.
Finally, we took the whole ESX host, cables and all over to the other server room and plugged it into the same switches as our other ESX host. Success! All the lights came on as expected. No geo redundancy of course, but at least we have server redundancy again.
Those facts were enough to convince Cisco to RMA the switch.
Plug both ends of the twinax into the server port. That produced green connection lights. Therefore the NIC is most likely good.
Plug the twinax in between an unused port on the 2960 and the 3850. That didn't work. So probably is most likely the 3850.
Finally, we took the whole ESX host, cables and all over to the other server room and plugged it into the same switches as our other ESX host. Success! All the lights came on as expected. No geo redundancy of course, but at least we have server redundancy again.
Those facts were enough to convince Cisco to RMA the switch.
ASKER
Thanks for the information Andy. For some reason, I don't seem to be able to get the point slider to work. Hope it works this time.
ASKER
Further information... Replacement switch didn't work either. Tried troubleshooting over several days. The third Cisco tech I spoke to finally noticed a problem with these commands:
Switch01#show redundancy
Redundant System Information :
-------------------------- ----
Available system uptime = 4 weeks, 2 days, 13 hours, 17 minutes
Switchovers system experienced = 0
Standby failures = 1
Last switchover reason = none
Hardware Mode = Simplex
Configured Redundancy Mode = sso
Operating Redundancy Mode = Non-redundant
Maintenance Mode = Disabled
Communications = Down Reason: Failure
Switch01#show switch
Switch/Stack Mac Address : ******* - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Switch# Role Mac Address Priority Version State
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------
1 Standby ***** 1 V02 HA sync in progress
*2 Active ***** 1 V02 Ready
... So the second switch had been stuck syncing for over a day and never came fully online. The solution was to change which switch was active:
redundancy force-switchover
...This caused the switches to reboot and Switch#1 came up as the master and after a short delay everything started working normally. All the connected ports showed connections.
So Andy, to your earlier comment, this may be why restarting the problem switch sometimes fixes the problem. Perhaps the master switch in the stack changes?
According to the Cisco tech, she had seen this happen before. So maybe it's documented out there somewhere already, but I never found it. I'm going to try and change this to the solution.
Switch01#show redundancy
Redundant System Information :
--------------------------
Available system uptime = 4 weeks, 2 days, 13 hours, 17 minutes
Switchovers system experienced = 0
Standby failures = 1
Last switchover reason = none
Hardware Mode = Simplex
Configured Redundancy Mode = sso
Operating Redundancy Mode = Non-redundant
Maintenance Mode = Disabled
Communications = Down Reason: Failure
Switch01#show switch
Switch/Stack Mac Address : ******* - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Switch# Role Mac Address Priority Version State
--------------------------
1 Standby ***** 1 V02 HA sync in progress
*2 Active ***** 1 V02 Ready
... So the second switch had been stuck syncing for over a day and never came fully online. The solution was to change which switch was active:
redundancy force-switchover
...This caused the switches to reboot and Switch#1 came up as the master and after a short delay everything started working normally. All the connected ports showed connections.
So Andy, to your earlier comment, this may be why restarting the problem switch sometimes fixes the problem. Perhaps the master switch in the stack changes?
According to the Cisco tech, she had seen this happen before. So maybe it's documented out there somewhere already, but I never found it. I'm going to try and change this to the solution.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
and it's something to do with the Network Interface bringing up the connection to the physical port.
Are ALL the hardware components on the HCL, and firmware updated ?
I would try replacement switch, nics and cables