Avatar of Brian B
Brian B
Flag for Canada asked on

No Connection Light on Switch

Hi everyone. Strange VMware problem we are having here and looking for idea.

vSAN running two hosts and a witness. For the vSAN portion, each node is connected to a Cisco 3850 via a pair of twinax cables. On one node there are no lights on the switch where the twinax connect. This was working before. I have tried the cables elsewhere and they work. I have tried other ports on the 3850 and it doesn't work there either. Dell just replaced the network card and it didn't help. On the switch, "sh int status" shows the twinax is plugged in, but not connected. Same for the server.

VMware shows nothing unusual other than the connection is down.

Again, this was working. Problem is only one node. The other one is fine as is the witness.

Anything else?
* vsanCiscoNetworkingVMware

Avatar of undefined
Last Comment
Brian B

8/22/2022 - Mon
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

We've got a switch here, which does the same thing.

and it's something to do with the Network Interface bringing up the connection to the physical port.

Are ALL the hardware components on the HCL, and firmware updated ?

I would try replacement switch, nics and cables
Brian B

Are ALL the hardware components on the HCL, and firmware updated ? I would try replacement switch, nics and cables

Yes, all on the HCL. Firmware, yes. I actually had to update the firmware in order to get the switch virtual stacking working on these 3850s. NIC was just replaced, cables I confirmed work elsewhere. Replacement switch? We had enough trouble getting the two that we have. I guess I could swap the switches. Since they are virtually stacked and everything plugs into the same ports on both switches, it *should* be plug and play, but I'll have to schedule an outage.

Going to try a couple more cable/port combinations first though.

Any other ideas still welcome!
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Not a solution, and not sure why with ours, if we unplug and re-plug the cable in, sometimes, the LINK is established!
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Brian B

So a couple of more items we checked...

Plug both ends of the twinax into the server port. That produced green connection lights. Therefore the NIC is most likely good.
Plug the twinax in between an unused port on the 2960 and the 3850. That didn't work. So probably is most likely the 3850.

Finally, we took the whole ESX host, cables and all over to the other server room and plugged it into the same switches as our other ESX host. Success! All the lights came on as expected. No geo redundancy of course, but at least we have server redundancy again.

Those facts were enough to convince Cisco to RMA the switch.
Brian B

Thanks for the information Andy. For some reason, I don't seem to be able to get the point slider to work. Hope it works this time.
Brian B

Further information... Replacement switch didn't work either. Tried troubleshooting over several days. The third Cisco tech I spoke to finally noticed a problem with these commands:

Switch01#show redundancy
Redundant System Information :
       Available system uptime = 4 weeks, 2 days, 13 hours, 17 minutes
Switchovers system experienced = 0
              Standby failures = 1
        Last switchover reason = none

                 Hardware Mode = Simplex
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = Non-redundant
              Maintenance Mode = Disabled
                Communications = Down      Reason: Failure

Switch01#show switch
Switch/Stack Mac Address : ******* - Local Mac Address
Mac persistency wait time: Indefinite
                                             H/W   Current
Switch#   Role    Mac Address     Priority Version  State
1       Standby  *****     1      V02     HA sync in progress
*2       Active   *****     1      V02     Ready

... So the second switch had been stuck syncing for over a day and never came fully online. The solution was to change which switch was active:
redundancy force-switchover
...This caused the switches to reboot and Switch#1 came up as the master and after a short delay everything started working normally. All the connected ports showed connections.

So Andy, to your earlier comment, this may be why restarting the problem switch sometimes fixes the problem. Perhaps the master switch in the stack changes?

According to the Cisco tech, she had seen this happen before. So maybe it's documented out there somewhere already, but I never found it. I'm going to try and change this to the solution.
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Brian B

View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.