Network Disconnection with all VMs

This is what happens:
I have 2 VMware 5.5 hosts with 2 VMs hosted by 1st host and 2 VMs hosted by the 2nd. I loose connection with all VMs randomly for about 10 seconds every 2 days.
The same physical switch connects to several other devices and there's no packet lost.
I've changed the VM's NIC from E1000E to VMXNET3 but that didn't help. I did that change on one VM only.
I kindly wait for any valid suggestion.
LVL 20
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
what does your networking look like on your ESXi hosts.
strivoliAuthor Commented:
I'm not 100% sure I understood your question. I attach 2 "Screen Shots" that could help. Both belong to one of the 2 hosts (the 2nd is quite the same). If the attached isn't enough, please ask for more. Thanks.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Is this a new installation, or just started to cause issues ?

okay, if you just use a single network interface on your hosts, does the issue resolve it self ?

what is the physical  network configuration on the switches?

what are the servers?

have you checked the firmware is up to date on your network interfaces, and you are using the latest patches for ESXi.
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

strivoliAuthor Commented:
I'll reply one-by-one:

1. This is new and it was setup by a DELL Partner early this year. The issue started since the beginning.
2. I didn't try yet. The two physical host's NIC ports are connected to 1 physical switch with 2 VLANs. That change would require some work and some risks. If you think it is worth, I could start thinking about it.
3. There's only 1 physical switch on top on the hosts. Ports use 1 VLAN. All other ports use the default VLAN which allows communication between the VMs and the rest of the network.
4. DELL PowerEdge R730 with 64GB RAM.
5. I must split this question/answer:
5a. No, I haven't checked NICs firmware. Consider there's NO packet lost on the host itself.
5b. No, I know I should consider it as an idea.

I'll be working on 5b. Please ask for more or post any further suggestions. Thanks.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
So this does not sound like it's been setup correctly....(ever!)

Can you move ALL VMs to a single host to re-do the host configuration?

We've just taken delivery of new R730s, the first thing we do, before we install any hypervisor, is ensure firmware is up to date, before we even install ESXi.

and then we install the Dell version of ESXi (the OEM version) to ensure drivers match host OS, we then patch ESXi to latest version, and then test in production.

Often the issue, is with network driver, firmware issues, or incorrect VLAN, physical switch configuration.

So this is why I suggest, going back to basics, even on a single rule out teaming/port trunking/bonding and VLANs.
strivoliAuthor Commented:
This is exactly what I thought too. And this is why I didn't ask the DELL Partner to fix. I've noticed and reported several "worst practices" applied by the "professional". I asked them to update before installing but they didn't.

I've taken one of the hosts. It's VMware ESXi, 5.5.0, 1746018 (ESXi 5.5 Update 1a) and I'm bringing it to 2068190 (ESXi 5.5 Update 2) and see what happens.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
are you using the OEM DELL ESXi 5.5 version from DELL ?

1. Appy firmware updates

2. Install OEM DELL ESXi 5.5 GA

3. Then Update.

4. Single nic on the vSwitch, single nic config, on the physical switch....

5. TEST TEST TEST and TEST again...

and then move on...
strivoliAuthor Commented:
I suspect they didn't use any OEM DELL ESXi. I downloaded the U2 from VMware's site. The upgrade to U2 was successful. Answers to your points:
1. Will do if even U2 doesn't help.
2. I don't know if I'm eligible. The upgrade done on one of the hosts was made with the ISO from VMware. I should investigate further about this point.
3. Already did it a few minutes ago. 2 VMs are already hosted by the upgraded host.
4. OK. I'll think about it if the upgrade doesn't help.
5. TESTs will run for about 2 days and I'll report back.

Thank you, Andrew.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
1. Okay, you must do this!

2. It's a download from DELL, input tag number from server, and follow the downloads.

3. You'll need to update again, when using the Dell GA version.

4. I would start with basic config.

5. test!!!
strivoliAuthor Commented:
Hi Andrew, the 1st host updated with the U2 (2068190) from VMware seems to work fine. I've "pinged" it 200K times with no packet lost. Past tests always returned 6-8 packet lost on the same number of pings. Now I've started a 400K ping test and see what happens.
I've checked the DELL site and they supply U2 (2068190) but VMware is up to build 2718055.
Do you still think it's better to update the 2nd host using the DELL image (2068190) instead of considering updating to the latest available build (2718055) from VMware?
Thank you for your help. Regards.

P.S.: based on the tests I'm running and on your opinion, I'll consider updating the 1st host once again.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
use the Dell OEM original ESXi U2 Build 2068190 and then use the update from VMware.

You should always use an OEM version of ESXi if available.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
strivoliAuthor Commented:
Andrew, updating from U1a (1746018) to U2 (2068190) fixed the network connectivity issue. Thank you for the help.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.