Link to home
Create AccountLog in
Avatar of Sid_F
Sid_F

asked on

Vmware Nic redundancy

My managed service company had an issue recently. A number of vms were inaccessible. Data on the physical nics seemed to have stopped getting through to the physical swtich. The management network remained working so HA was not initiated. A reboot brought the nics back maybe a network card issue but still investigating. There are two nics assigned to the vswitch presently.
What is the best setup for redundancy in this situation. Should HA be migrating the machines once they lose contact with the production network to a working esx host?. Should a vmkernal port be added to make the vswitch part of the management network (I may be wrong here) Thanks
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

VMware HA uses the Management Network to reach your Default Gateway and DNS servers by default, this is the isolation address, if the isolation address cannot be reached, then it assumes a host has failed, and initiates VMware HA.

As you stated the Management Network was still functional, it was seen as everthing was fine with the HOST, and Service Continued.

In v5 this is slightly different because the heartbeat is maintained between Master and Slaves Hosts across the Management Network, and is less dependant upon Gateway and DNS.

VMware HA will only commence, if the Host fails, in this case your hosts did not fail.

I assume that the Virtual Machine network is connected to a different vSwitch, and the NICS connecting this vSwitch somehow failed?

I would check physical network, teaming and load balancing around this vSwitch and physcical switch arrangement.
Avatar of Sid_F
Sid_F

ASKER

Thanks yes it seems the vswitch had a number of vms connected and they were linked to two physical network cards which for what ever reason stopped allowing traffic through(info is third had at present but doesn't add up to me!)
Lets suppose it did have a driver issue am I wrong in assuming there should be something from a vmware point of view to see a failure to the production lan and vmotion the machines else where (as management lan was still working) or something similar
Secondly I presume I should be able to get detailed logs on this failure from the vcentre client?
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer