Intermittent network connection

Posted on 2009-05-16
Last Modified: 2012-06-27
I have the following network setup:

4 Hosts (Debian, Ubuntu, 2x Freebsd) on a Gbit-Ethernet switch uplinked to ->
a 24-port Fast-Ethernet with a variety of heterogenous hosts in a local lan.

All of the hosts are in the same class C network and use the same gateway and dns server.

I recently added an Esxi server with 2 gbit-Ethernet cards  to the network, with one card configured as a dedicated management interface, connected to the Fast-Ethernet switch with an IP in the local network. The other card,  used by the VM-nets, is connected to the Gbit Ethernet switch. (I have also tried it the other way around)

The problem: the management network is intermittently unreachable from all but ONE machine, a FreeBSD machine on the same switch. Really weird. I can log into to any other host on the same switch and sometimes have a good connection, sometimes   a "no route to host" when I try to ping the management interface. But the connection from the one FreeBSD box on the same switch is rock stable.

I have, of course, tried replacing cables and every possible  switch <-> host combination.

Any ideas on what is going on or how to troubleshoot this?

Question by:alpha-lemming
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 10

Accepted Solution

lanboyo earned 250 total points
ID: 24402896
Is the ESXI server gigabit interface trunked? Make sure the management vlan in removed from the gigabit link on the switch side.

Is the loss of the management vlan just the esxi interface or the loss of a whole vlan used for management of things that include the esxi?

Off the cuff, I notice that the default arp cache timeout for free bsd is 20 minutes, while it has a maximum of 10 minutes in windows. A possibility is that something occurs that prevents the esxi box from responding to an arp whois request or that prevents the rest of the network from hearing the responses.  Or, for some reason the esxi has decided it's management interface is better on the other interface and the ip needs to change mac addresses, somehow the more robust code on the bsd device is able to notice this and adapt. Perhaps it sees gratuitous arps better.


When the problem is not occuring go to a windows box and do an

arp -a

Find and note the mac address that corresponds to the ip address of the esxi. This is an HP printer at my home network for instance.           00-17-08-87-44-84     dynamic

It's MAC address is 00-17-08-87-44-84 . Check on the BSD device,  with the command "arp -an"  the n is to not do dns reverse lookups, which speeds things up usually. The response is a little different,

? ( at 00:17:08:87:44:84 [ether] on eth0

But although the :- separators are different it is the same mac.

1st, make sure the macs are the same. 2nd, do the same thing when the problem is occuring, from a workstation that has the problem and the working BSD.

Do they booth have arp entries? Do they match? Do they match the previous address?

Also, the no route to host error usually means that the local router is unable to get an arp response, and sends that error back. Where is the local router. Is the BSD box or any of the other boxes dual homed? Is the management vlan the same class c? How to the boxes try to connect? SO many questions.  

LVL 10

Expert Comment

ID: 24423798
Any updates?

Author Comment

ID: 24436595
Sorry for the delay, had to go out of town..

No, the hosts that cannot connect do not have or get a mac address for the management nic

arp <host> spits out the ip addres, then "no entry" for the mac...

The management interface is not in a VLAN and failover/loadbalancing with the other nic ist turned off.

It's just weird that this one BSD box has a rock solid connection while all the others are flaky..


Assisted Solution

ENCOSE earned 250 total points
ID: 24490237
sounds like a speed/duplex mismatch...
try checking EVERY device port and switch port to make sure they are all on Auto/Auto.

a common misconception is that one side can be manually set with the other on auto/auto... which does not work properly

Josh Kwok, MCSE, CCNP

Author Comment

ID: 24533771
All the nics are in Autonegotiate mode.
I found the culprit, although I don't know the exact cause yet.
Shutting down one of the other hosts, which is running vmware-server makes everything work right. Maybe I had duplicate macs or something...

Featured Post

Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Most of the applications these days are on Cloud. Cloud is ubiquitous with many service providers in the market. Since it has many benefits such as cost reduction, software updates, remote access, disaster recovery and much more.
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question