Solved

Intermittent network connection

Posted on 2009-05-16
5
910 Views
Last Modified: 2012-06-27
I have the following network setup:

4 Hosts (Debian, Ubuntu, 2x Freebsd) on a Gbit-Ethernet switch uplinked to ->
a 24-port Fast-Ethernet with a variety of heterogenous hosts in a local lan.

All of the hosts are in the same class C network and use the same gateway and dns server.

I recently added an Esxi server with 2 gbit-Ethernet cards  to the network, with one card configured as a dedicated management interface, connected to the Fast-Ethernet switch with an IP in the local network. The other card,  used by the VM-nets, is connected to the Gbit Ethernet switch. (I have also tried it the other way around)

The problem: the management network is intermittently unreachable from all but ONE machine, a FreeBSD machine on the same switch. Really weird. I can log into to any other host on the same switch and sometimes have a good connection, sometimes   a "no route to host" when I try to ping the management interface. But the connection from the one FreeBSD box on the same switch is rock stable.

I have, of course, tried replacing cables and every possible  switch <-> host combination.

Any ideas on what is going on or how to troubleshoot this?

Thanks!
0
Comment
Question by:alpha-lemming
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 10

Accepted Solution

by:
lanboyo earned 250 total points
ID: 24402896
Is the ESXI server gigabit interface trunked? Make sure the management vlan in removed from the gigabit link on the switch side.

Is the loss of the management vlan just the esxi interface or the loss of a whole vlan used for management of things that include the esxi?


Off the cuff, I notice that the default arp cache timeout for free bsd is 20 minutes, while it has a maximum of 10 minutes in windows. A possibility is that something occurs that prevents the esxi box from responding to an arp whois request or that prevents the rest of the network from hearing the responses.  Or, for some reason the esxi has decided it's management interface is better on the other interface and the ip needs to change mac addresses, somehow the more robust code on the bsd device is able to notice this and adapt. Perhaps it sees gratuitous arps better.

Anyway...

When the problem is not occuring go to a windows box and do an

arp -a

Find and note the mac address that corresponds to the ip address of the esxi. This is an HP printer at my home network for instance.

  192.168.1.7           00-17-08-87-44-84     dynamic

It's MAC address is 00-17-08-87-44-84 . Check on the BSD device,  with the command "arp -an"  the n is to not do dns reverse lookups, which speeds things up usually. The response is a little different,

? (192.168.1.7) at 00:17:08:87:44:84 [ether] on eth0

But although the :- separators are different it is the same mac.

1st, make sure the macs are the same. 2nd, do the same thing when the problem is occuring, from a workstation that has the problem and the working BSD.

Do they booth have arp entries? Do they match? Do they match the previous address?

Also, the no route to host error usually means that the local router is unable to get an arp response, and sends that error back. Where is the local router. Is the BSD box or any of the other boxes dual homed? Is the management vlan the same class c? How to the boxes try to connect? SO many questions.  








0
 
LVL 10

Expert Comment

by:lanboyo
ID: 24423798
Any updates?
0
 

Author Comment

by:alpha-lemming
ID: 24436595
Sorry for the delay, had to go out of town..

No, the hosts that cannot connect do not have or get a mac address for the management nic

arp <host> spits out the ip addres, then "no entry" for the mac...

The management interface is not in a VLAN and failover/loadbalancing with the other nic ist turned off.

It's just weird that this one BSD box has a rock solid connection while all the others are flaky..

0
 
LVL 2

Assisted Solution

by:ENCOSE
ENCOSE earned 250 total points
ID: 24490237
sounds like a speed/duplex mismatch...
try checking EVERY device port and switch port to make sure they are all on Auto/Auto.

a common misconception is that one side can be manually set with the other on auto/auto... which does not work properly


Josh Kwok, MCSE, CCNP
ENCOSE
0
 

Author Comment

by:alpha-lemming
ID: 24533771
All the nics are in Autonegotiate mode.
I found the culprit, although I don't know the exact cause yet.
Shutting down one of the other hosts, which is running vmware-server makes everything work right. Maybe I had duplicate macs or something...
0

Featured Post

Free NetCrunch network monitor licenses!

Only on Experts-Exchange: Sign-up for a free-trial and we'll send you your permanent license!

Here is what you get: 30 Nodes | Unlimited Sensors | No Time Restrictions | Absolutely FREE!

Act now. This offer ends July 14, 2017.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

During and after that shift to cloud, one area that still poses a struggle for many organizations is what to do with their department file shares.
This article is in regards to the Cisco QSFP-4SFP10G-CU1M cables, which are designed to uplink/downlink 40GB ports to 10GB SFP ports. I recently experienced this and found very little configuration documentation on how these are supposed to be confi…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
If you're a developer or IT admin, you’re probably tasked with managing multiple websites, servers, applications, and levels of security on a daily basis. While this can be extremely time consuming, it can also be frustrating when systems aren't wor…

628 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question