Solved

Intermittent network connection

Posted on 2009-05-16
5
896 Views
Last Modified: 2012-06-27
I have the following network setup:

4 Hosts (Debian, Ubuntu, 2x Freebsd) on a Gbit-Ethernet switch uplinked to ->
a 24-port Fast-Ethernet with a variety of heterogenous hosts in a local lan.

All of the hosts are in the same class C network and use the same gateway and dns server.

I recently added an Esxi server with 2 gbit-Ethernet cards  to the network, with one card configured as a dedicated management interface, connected to the Fast-Ethernet switch with an IP in the local network. The other card,  used by the VM-nets, is connected to the Gbit Ethernet switch. (I have also tried it the other way around)

The problem: the management network is intermittently unreachable from all but ONE machine, a FreeBSD machine on the same switch. Really weird. I can log into to any other host on the same switch and sometimes have a good connection, sometimes   a "no route to host" when I try to ping the management interface. But the connection from the one FreeBSD box on the same switch is rock stable.

I have, of course, tried replacing cables and every possible  switch <-> host combination.

Any ideas on what is going on or how to troubleshoot this?

Thanks!
0
Comment
Question by:alpha-lemming
  • 2
  • 2
5 Comments
 
LVL 10

Accepted Solution

by:
lanboyo earned 250 total points
ID: 24402896
Is the ESXI server gigabit interface trunked? Make sure the management vlan in removed from the gigabit link on the switch side.

Is the loss of the management vlan just the esxi interface or the loss of a whole vlan used for management of things that include the esxi?


Off the cuff, I notice that the default arp cache timeout for free bsd is 20 minutes, while it has a maximum of 10 minutes in windows. A possibility is that something occurs that prevents the esxi box from responding to an arp whois request or that prevents the rest of the network from hearing the responses.  Or, for some reason the esxi has decided it's management interface is better on the other interface and the ip needs to change mac addresses, somehow the more robust code on the bsd device is able to notice this and adapt. Perhaps it sees gratuitous arps better.

Anyway...

When the problem is not occuring go to a windows box and do an

arp -a

Find and note the mac address that corresponds to the ip address of the esxi. This is an HP printer at my home network for instance.

  192.168.1.7           00-17-08-87-44-84     dynamic

It's MAC address is 00-17-08-87-44-84 . Check on the BSD device,  with the command "arp -an"  the n is to not do dns reverse lookups, which speeds things up usually. The response is a little different,

? (192.168.1.7) at 00:17:08:87:44:84 [ether] on eth0

But although the :- separators are different it is the same mac.

1st, make sure the macs are the same. 2nd, do the same thing when the problem is occuring, from a workstation that has the problem and the working BSD.

Do they booth have arp entries? Do they match? Do they match the previous address?

Also, the no route to host error usually means that the local router is unable to get an arp response, and sends that error back. Where is the local router. Is the BSD box or any of the other boxes dual homed? Is the management vlan the same class c? How to the boxes try to connect? SO many questions.  








0
 
LVL 10

Expert Comment

by:lanboyo
ID: 24423798
Any updates?
0
 

Author Comment

by:alpha-lemming
ID: 24436595
Sorry for the delay, had to go out of town..

No, the hosts that cannot connect do not have or get a mac address for the management nic

arp <host> spits out the ip addres, then "no entry" for the mac...

The management interface is not in a VLAN and failover/loadbalancing with the other nic ist turned off.

It's just weird that this one BSD box has a rock solid connection while all the others are flaky..

0
 
LVL 2

Assisted Solution

by:ENCOSE
ENCOSE earned 250 total points
ID: 24490237
sounds like a speed/duplex mismatch...
try checking EVERY device port and switch port to make sure they are all on Auto/Auto.

a common misconception is that one side can be manually set with the other on auto/auto... which does not work properly


Josh Kwok, MCSE, CCNP
ENCOSE
0
 

Author Comment

by:alpha-lemming
ID: 24533771
All the nics are in Autonegotiate mode.
I found the culprit, although I don't know the exact cause yet.
Shutting down one of the other hosts, which is running vmware-server makes everything work right. Maybe I had duplicate macs or something...
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Suggested Solutions

Creating an OSPF network that automatically (dynamically) reroutes network traffic over other connections to prevent network downtime.
If your business is like most, chances are you still need to maintain a fax infrastructure for your staff. It’s hard to believe that a communication technology that was thriving in the mid-80s could still be an essential part of your team’s modern I…
Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now