Local Network Traffic Fails Until External Connection

This is one of the strangest things I've seen.

I have two machines running Windows Server 2003 at our colocation facility connected through a brand new Netgear switch. (I just changed the switch because I suspected it was responsible for the behavior I am about to describe.) One machine is our web server, the other is our database server. They are both accessible through RDP. They are both located behind a firewall which is doing NAT.

I started investigating this problem when the code I was running on the web server was having intermittant connection problems connecting to the database server.

Here is the scenario: I log on using RDP to both machines. I open a command prompt and setup a perpetual ping to the other machine using their local addresses. ping -t 10.2.2.2 on one machine and ping -t 10.2.2.3 on the other machine.

Everything looks fine. The pings reply with zero packet loss. I log off of both machines, leaving the ping command running. I continue testing my app. After a while, my app on the web server is unable to connect to the database. I log back on to the database machine, and see that the command prompt shows that the pings are failing. However, as soon as I have logged back on, the pings start succeeding again, and everything works again.

So it looks like connecting to the machine from the outside world somehow solves the local network problem. I have no clue why this would be, but those are the results as I see them.

Do you have any idea what is going on? Obviously this is an unacceptable situation and needs to be resolved, but I don't know how to proceed.

I have replaced the switch. Now I am considering replacing the network card, but I don't have any evidence that the card is the culprit.

Thanks for your help.

--Matt



molson8472Asked:
Who is Participating?
 
ruddgConnect With a Mentor Commented:
I am not en expert on the PIX proxy arp implementation, but basically the PIX sends its own MAC address in an arp reply when a host broadcasts for resolution on the attached segment.  What I have typically seen in this case is two arp replies coming back to the broadcasting host: one from the actual host and one from the PIX.  The problem you experienced is observed when the PIX reply comes in after the real host's reply and trumps the actual MAC address with its MAC address.  The broadcasting host then sends its packets to the PIX, which drops them.  There are reasons to implement proxy arp, such as when a PIX sits between nodes on a common subnet.  I can't speak intelligently about it beyond that...
0
 
ruddgCommented:
Do both machines have the same subnet mask applied (i.e 255.255.255.0)?  Do they match the gateway device (firewall) subnet mask?  What type of firewall/router is in place?  Are there any proxy-arp features enabled?  Are there any other systems on that network segment?

This sounds like an arp problem.  On each server, try setting up a static arp entry to the other server:

(on server 10.2.2.2)
arp -s 10.2.2.3 <mac-address>

&

(on server 10.2.2.3)
arp -s 10.2.2.2 <mac-address>

You can get each machines' MAC address from an 'ipconfig /all' command.
0
 
molson8472Author Commented:
Thanks for your reply ruddg.

Both machines are on the same subnet -- 255.255.255.0. They both have the same default gateway -- 10.2.2.1, which is the firewall, a Cisco PIX 501.

To be honest, I don't know if there are any proxy-arp features enabled. I didn't make any arp configurations on the PIX, so it is using the defaults as far as that is concerned.

There are three other machines on the same subnet. I haven't done a detailed investigation into whether they are having the same issue. So far the only two machines I've needed to talk to each other are the ones in question.

At first, I had these machines plugged directly into the integrated switch on the firewall. My theory was that the pix was dumb and needed for there to be an xlate for it to keep track of ips and mac addresses on its switch, so out of desperation i put in a small netgear switch, to which this machines are now attached, but I'm having the same problem.

I've never had this kind of trouble before -- why would I need a static arp entry? Why wouldn't the switch know about the mac addresses of attached devices? It shouldn't even be getting to the firewall at this point, right? The switch should be sending packets directly to the other machine without even involving the firewall. (I think.)

I'll try your suggestion and setup static arp entries.

--Matt

0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

 
ruddgCommented:
If possible, please post your (sanitized) PIX config.  Proxy arp is enabled by default on the PIX -- you may have a translation that is causing the PIX to send arp replies to your inside subnet.
0
 
molson8472Author Commented:
OK, so acting on your tip, I checked the PIX config and found that proxy arp was enabled on both the outside and inside interfaces. I executed a show arp command and found that there were entries for all the servers in question. The mac addresses were correct. However, just as a test, I disabled proxy arp on both interfaces, and cleared the proxy cache with a clear arp command. That seems to have done the trick. I'm no longer experiencing the same behavior. Very nice -- thanks!

I'm still confused as to why this happened, though. I haven't changed network cards for any of the machines in question, and the arp cache on the firewall showed the correct information. So even with proxy arp, it should have replied with the correct information, and there shouldn't have been an issue.

One thing I did notice when looking at the output of show arp statistics:

      Dropped blocks in ARP: 59
      Maximum Queued blocks: 2
      Queued blocks: 0
      Interface collision ARPs Received: 0
      ARP-defense Gratuitous ARPS sent: 0
      Total ARP retries: 207
      Unresolved hosts: 0
      Maximum Unresolved hosts: 1

Is it possible that the PIX failed to respond to ARP requests because it was too busy processing other traffic? We have been experiencing some heavy traffic lately. That's the only explanation I can come up with.

If you can explain this further, I'd greatly appreciate it.

Also, one other question: should I disable proxy arp on both the inside and outside interface (as I've done) or just the inside?

--Matt



0
 
molson8472Author Commented:
Well, clearing the ARP cache and disabling proxy ARP on the PIX did in fact fix the problem. Thanks, ruddg!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.