ESXi 4.0 U1 - Management network becomes unstable after a few days

I’ve installed ESXi 4.0 Update 1 on two identical machines that reside in the same network segment. On both servers, I’ve created two virtual machines. One runs RedHat Enterprise Linux 5.4 and one runs a small load balancer appliance (Hercules).

Hardware:
Dell PowerEdge R210
Intel Xeon X3450 2.66GHz HT
8GB RAM
2x500 GB in RAID 1
Using ONE port of the internal Broadcom netxtreme II bcm5716 NIC (this port is shared between the management network and the VM’s).
(all hardware is marked as ‘supported’ by VMware)

We applied all available patches, including the recent april 1st patch; we’re at build 244038 now.

The Problem
After a few days the vSphere client cannot establish a connection to the ESXi hosts anymore. The virtual machines continue to keep running without any problem, however. Only a full reset (applied thru the remote power cycle) restores the connectivity to the management network. We experience this issue on both servers: about three days after power-on/reset, the vSphere client cannot connect anymore.

Observations:
• Only the management network suffers from connectivity problems.
• Restarting the management network (agents) via the physical console doesn’t restore service
• The physical console offers some basic diagnostics like ‘testing the management network’. The PING tests intermittently fail: about half of the PINGs to the gateway or dns-servers fails. The hardware and the network config MUST be correct, since the management network works for a few days before failing and the VM’s keep running without any problem.
• We’ve investigated the network traffic from a remote vSphere client that is trying to connect to the ESXi server using a packet sniffer. The remote ESXi hosts resets the connection after initial contact, so there IS packet interchange.

Given the above, I strongly suspect a problem in the network driver in ESXi, but I don’t know how to diagnose the issue any further. I’ve exhausted all options on the physical ESXi console. I know how to access the (unsupported) commandline console, but don’t know what to look for. Could it be a problem that the management network shares the same NIC as the VM’s?

I’ve been struggling with this issue for a several weeks now – any help/suggestions is highly appreciated.

Do you have what it takes to be an Expert...
and a couple of hours to prove it?

Answer a handful of technical questions per month, or write a few articles and earn FREE, unlimited access to Experts Exchange.

Learn More

201407-VQP-003

Experts Exchange powers the
growth and success of technology
professionals worldwide.

Try it Free

30 day free trial. Cancel anytime.

Learn More about How It Works

Experts Exchange powers the growth and success
of technology professionals worldwide.

  • Solve

    Experts Exchange is the tech professional’s trusted, on-demand resource for solving difficult problems, making informed decisions, and delivering excellent solutions.

  • Learn

    With unparalleled access to technical experts, verified real-world solutions, and diverse educational content, Experts Exchange enables personalized development of technology skills.

  • Network

    Experts Exchange gives you the professional exposure and valued relationships key to building the career you want.

Join the Network Today

See Plans and Pricing