Okay so this is a tough one, and as such I'm going to reward the highest amount of points for it. On the same token, I probably will not disclose all of the information the first time around required to help debug this issue. It will require multiple posts which is fine.
This is potentially an arp issue or a VLAN misconfiguration issue. I'm dealing with a 3845 router which sits at our datacenter and serves as a ds3 cross connect to our office. We have two VLANs:
VLAN 50 - web servers (10.50/16)
VLAN 51 - databases (10.51/16)
When trying to access VLAN 51 interfaces from our office subnet, it can take a considerable amount of time for arp to properly cache on the destination box for 10.50.0.5. It is unreachable and arp -n shows the following for it:
[root@cc70-5 ~]# arp -n |grep 10.50.0.5
10.50.0.5 (incomplete) eth1
You can see the initial ping latency below also (and traceroutes are broken until the arp cache is set after a ping)
tag1349:~ sfinkelstein$ ping 10.51.5.70
PING 10.51.5.70 (10.51.5.70): 56 data bytes
64 bytes from 10.51.5.70: icmp_seq=248 ttl=62 time=1002.861 ms <--- bad
64 bytes from 10.51.5.70: icmp_seq=249 ttl=62 time=2.827 ms
64 bytes from 10.51.5.70: icmp_seq=250 ttl=62 time=1.930 ms
64 bytes from 10.51.5.70: icmp_seq=251 ttl=62 time=1.979 ms
64 bytes from 10.51.5.70: icmp_seq=252 ttl=62 time=1.824 ms
If I set the arp entry manually then I never see this issue. I do something like the following as a temporary work around:
arp -s 10.50.0.5 00:15:F9:0C:65:A1 dev eth0
Just another note. If I ping a 10.51 interface from a 10.50 interface, this creates the arp entry right away alleviating the issue from the office subnet not being able to ping it. It'll create the arp cache for an office ping/tcp socket request after 30 seconds to 5 minutes after the initial try.
Thanks again for any assistance and please let me know if there's any other information I can provide with my network topology, router versions/configs etc to help fix this problem.