Link to home
Start Free TrialLog in
Avatar of stevefNYC
stevefNYC

asked on

Cisco 3845 / Arp issue

Okay so this is a tough one, and as such I'm going to reward the highest amount of points for it. On the same token, I probably will not disclose all of the information the first time around required to help debug this issue. It will require multiple posts which is fine.

This is potentially an arp issue or a VLAN misconfiguration issue. I'm dealing with a 3845 router which sits at our datacenter and serves as a ds3 cross connect to our office. We have two VLANs:

VLAN 50 - web servers (10.50/16)
VLAN 51 - databases (10.51/16)

When trying to access VLAN 51 interfaces from our office subnet,  it can take a considerable amount of time for arp to properly cache on the destination box for 10.50.0.5. It is unreachable and arp -n shows the following for it:

[root@cc70-5 ~]# arp -n |grep 10.50.0.5
10.50.0.5                        (incomplete)                              eth1

You can see the initial ping latency below also (and traceroutes are broken until the arp cache is set after a ping)

tag1349:~ sfinkelstein$ ping 10.51.5.70
PING 10.51.5.70 (10.51.5.70): 56 data bytes
64 bytes from 10.51.5.70: icmp_seq=248 ttl=62 time=1002.861 ms <--- bad
64 bytes from 10.51.5.70: icmp_seq=249 ttl=62 time=2.827 ms
64 bytes from 10.51.5.70: icmp_seq=250 ttl=62 time=1.930 ms
64 bytes from 10.51.5.70: icmp_seq=251 ttl=62 time=1.979 ms
64 bytes from 10.51.5.70: icmp_seq=252 ttl=62 time=1.824 ms

If I set the arp entry manually then I never see this issue. I do something like the following as a temporary work around:

arp -s 10.50.0.5 00:15:F9:0C:65:A1 dev eth0

Just another note. If I ping a 10.51 interface from a 10.50 interface, this creates the arp entry right away alleviating the issue from the office subnet not being able to ping it. It'll create the arp cache for an office ping/tcp socket request after 30 seconds to 5 minutes after the initial try.

Thanks again for any assistance and please let me know if there's any other information I can provide with my network topology, router versions/configs etc to help fix this problem.
Avatar of Don Johnston
Don Johnston
Flag of United States of America image

So you're saying that if you try to ping a device on VLAN 50 from a device on VLAN 51, you experience this problem, but if you ping the router itself you don't have the delay?

How is the 3845 connected to the VLAN's? Are you trunking to a switch or are you using two seperate interfaces on the 3845?

Do the workstations have a default gateway set or are you using Proxy ARP?

Avatar of stevefNYC
stevefNYC

ASKER

I'm trying to ping a device on VLAN 51 from the office subnet. If I ping from VLAN 50, it'll properly create the the arp cache on the linux box and allows me to ping. Also for boxes which have both a VLAN 50 and VLAN 51 interface, if I ping the VLAN 50 interface first, I can then make subsequent requests to VLAN 51 without a problem.

Here's some results directly from the router:

nap2gbxds3#ping 10.51.5.66

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.51.5.66, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 1/1/1 ms
nap2gbxds3#ping 10.51.5.70

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.51.5.70, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
nap2gbxds3#ping 10.51.5.61

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.51.5.61, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 1/1/1 ms
nap2gbxds3#

Some are 80 percent, some are 100.

The 3845 is connected to the VLANs with two seperate interfaces. One RJ45 twisted copper line into a VLAN 50 interface on one of our two edge 6509s and the same for VLAN 51.

I hope this answers your questions, donjohnston.  Thanks a bunch.
It is typical for the first ping to fail while the ARP entries are populated at end and intermediate devices. If that's the only problem you're experiencing, I wouldn't worry about it.
Only the first ping from the router is failing. There is a much bigger issue in which boxes behind the office subnet cannot access hosts on 10.51/16 (VLAN 51) for what can be five minutes unless a prior TCP connection has been made or a static arp cache has been placed onto the destination box for the ds3 router.

tag1349:~ sfinkelstein$ time ping 10.51.5.70
PING 10.51.5.70 (10.51.5.70): 56 data bytes
^C
--- 10.51.5.70 ping statistics ---
423 packets transmitted, 0 packets received, 100% packet loss

real    7m2.833s
user    0m0.004s
sys     0m0.022s

I had to ^C out of there after 7 minutes of waiting. It can take up to 30 minutes sometimes for the arp lookup to take place.
ASKER CERTIFIED SOLUTION
Avatar of Don Johnston
Don Johnston
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I invoked that arp command on any of my 200 boxes which have a VLAN 51 segment subnet aliased to one of the interfaces. ie: 10.51.3.250 as an example. Yes I am setting the default gateway through each end station through the following:

[root@cc60-5 ~]# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.100.0   10.50.0.5       255.255.254.0   UG        0 0          0 eth1

You know, now that you mention it .. I removed the static route from the routing table on one linux box. We have a static route set on our Netscalers for this network. There is now no timeout at all. It also never caches the arp for the ds3 router for which packets traverse, but I *think* I'm able to ping VLAN 51 interfaces without any issues now that I removed the static route from the local linux boxes.

That is totally weird!  Any idea why? let me confirm and I'll reward you the points for your generous help, donjohnston.

Steve
Feel free to put this in the clean up area, keith.

Thank you.

S.