No Connection Light on Switch

Brian B
Brian B used Ask the Experts™
on
Hi everyone. Strange VMware problem we are having here and looking for idea.

vSAN running two hosts and a witness. For the vSAN portion, each node is connected to a Cisco 3850 via a pair of twinax cables. On one node there are no lights on the switch where the twinax connect. This was working before. I have tried the cables elsewhere and they work. I have tried other ports on the 3850 and it doesn't work there either. Dell just replaced the network card and it didn't help. On the switch, "sh int status" shows the twinax is plugged in, but not connected. Same for the server.

VMware shows nothing unusual other than the connection is down.

Again, this was working. Problem is only one node. The other one is fine as is the witness.

Anything else?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
We've got a switch here, which does the same thing.

and it's something to do with the Network Interface bringing up the connection to the physical port.

Are ALL the hardware components on the HCL, and firmware updated ?

I would try replacement switch, nics and cables
Brian BEE Topic Advisor, Independant Technology Professional

Author

Commented:
Are ALL the hardware components on the HCL, and firmware updated ? I would try replacement switch, nics and cables

Yes, all on the HCL. Firmware, yes. I actually had to update the firmware in order to get the switch virtual stacking working on these 3850s. NIC was just replaced, cables I confirmed work elsewhere. Replacement switch? We had enough trouble getting the two that we have. I guess I could swap the switches. Since they are virtually stacked and everything plugs into the same ports on both switches, it *should* be plug and play, but I'll have to schedule an outage.

Going to try a couple more cable/port combinations first though.

Any other ideas still welcome!
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Not a solution, and not sure why with ours, if we unplug and re-plug the cable in, sometimes, the LINK is established!
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

Brian BEE Topic Advisor, Independant Technology Professional

Author

Commented:
So a couple of more items we checked...

Plug both ends of the twinax into the server port. That produced green connection lights. Therefore the NIC is most likely good.
Plug the twinax in between an unused port on the 2960 and the 3850. That didn't work. So probably is most likely the 3850.

Finally, we took the whole ESX host, cables and all over to the other server room and plugged it into the same switches as our other ESX host. Success! All the lights came on as expected. No geo redundancy of course, but at least we have server redundancy again.

Those facts were enough to convince Cisco to RMA the switch.
Brian BEE Topic Advisor, Independant Technology Professional

Author

Commented:
Thanks for the information Andy. For some reason, I don't seem to be able to get the point slider to work. Hope it works this time.
Brian BEE Topic Advisor, Independant Technology Professional

Author

Commented:
Further information... Replacement switch didn't work either. Tried troubleshooting over several days. The third Cisco tech I spoke to finally noticed a problem with these commands:

Switch01#show redundancy
Redundant System Information :
------------------------------
       Available system uptime = 4 weeks, 2 days, 13 hours, 17 minutes
Switchovers system experienced = 0
              Standby failures = 1
        Last switchover reason = none

                 Hardware Mode = Simplex
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = Non-redundant
              Maintenance Mode = Disabled
                Communications = Down      Reason: Failure

Switch01#show switch
Switch/Stack Mac Address : ******* - Local Mac Address
Mac persistency wait time: Indefinite
                                             H/W   Current
Switch#   Role    Mac Address     Priority Version  State
-------------------------------------------------------------------------------------
1       Standby  *****     1      V02     HA sync in progress
*2       Active   *****     1      V02     Ready

... So the second switch had been stuck syncing for over a day and never came fully online. The solution was to change which switch was active:
redundancy force-switchover
...This caused the switches to reboot and Switch#1 came up as the master and after a short delay everything started working normally. All the connected ports showed connections.

So Andy, to your earlier comment, this may be why restarting the problem switch sometimes fixes the problem. Perhaps the master switch in the stack changes?

According to the Cisco tech, she had seen this happen before. So maybe it's documented out there somewhere already, but I never found it. I'm going to try and change this to the solution.
EE Topic Advisor, Independant Technology Professional
Commented:
Sorry for the confusion. Further information shows the above was the solution.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial