Link to home
Start Free TrialLog in
Avatar of njpetrucelli
njpetrucelli

asked on

Domain Computers Randomly Lose Internet Connection

I am running Windows SBS 2011 w/ Exchange on 1 server. This server provides DHCP for about 50 nodes. Random computers at random times will lose internet access (yellow explanation point in network icon). All intranet services will work, but user cannot access internet.
1. This happens randomly
2. After a while the computer will reconnect to internet
3. I think it is a DNS issue
4. Server DHPC does not show anything abnormal
5. User can ping internal IP's EXCEPT the gateway (gateway is a cloud firewall - which has been tested no issues)
6. Release and Renew IP does nothing to fix it
7. grabbing a new IP does not fix it

I need some help please.
Thanks,
Nick
Avatar of asavener
asavener
Flag of United States of America image

This sounds like a problem that I used to see on firewalls configured with a 10 user limit.  In order to enforce the 10 user limit, the vendor simply limited the number of ARP entries to 10.

This doesn't sound like a IP conflict to me, or else you'd be having trouble reaching things other than just the gateway.

Run arp -a on a machine experiencing the issue, and on a machine not experiencing the issue and see whether they both have the same entry for the gateway.  If you can check the arp table on the gateway at the same time, that would be ideal.
Avatar of njpetrucelli
njpetrucelli

ASKER

Hi asavener,

Thank you for the quick response.
I did as you asked and one node (experiencing the problem) did not list the gateway IP in the arp -a.

The gateway is a cloud firewall (managed by ISP) I have been through the run around with them, but they state traffic is going in and out properly. I would not expect the nodes (experiencing the problem) to ping or contact the gateway when this issue is happening.

i am really not thinking this is an issue with the router (gateway).

It is acting like the name is not resolving properly (DNS issue)
UPDATE:

I found another post on a similar issue having to do with WiFi users.
I disconnected both my AP's and low and behold - the problem went away.

Perhaps an android or IPhone user is causing an IP or MAC duplication?

I need these AP working - how do I go about figuring out which phone is causing the issue?
The AP's do not hand out DHCP, just WiFi to users. both AP's have a static IP and are excluded from DHCP on the server. Could I have a rouge phone causing this issue?

Thanks,
Never mind: It is still happening even with the AP's unplugged. Back to the drawing board.

It is really strange. One node I am working on had internet access without a problem, and randomly just stopped and threw an explanation point through the network icon.

DNS has both a forward and reverse A record for the IP. Weird
It's not a name resolution/DNS issue.  The client has no need to resolve the name of the gateway to an IP address.

Try adding a static ARP entry to an affected machine.  That MIGHT be a workaround for the problem.

I suppose it is possible that another machine is getting assigned the IP address of the gateway, but that would result in it having a different result in the ARP table.
Thank you for the suggestion, interestingly enough, I reset my switches this morning and everything cleared up.
I do not think I have a bad switch (all 3 are brand new) but I had them daisy chained - now each are plugged directly to the adtran, maybe this will help me localize the issue (if it is in fact a switch).

Could a bad switch cause random issues like this? Wouldn't you expect it to work or not work, not be random? (these are unmanaged)
It doesn't sound like a bad switch.  Typically what you would see is a bad port or bad bank of ports, where there's no traffic and/or no links.

Connecting the switches in the wrong way would typically result in a broadcast storm, and little or no traffic would make it through.

Really looks like an ARP resolution issue.
When I try the arp -s workaround, am I adding the ip and mac of the router (gateway)?

Is this a router issue, do you think?
Yes, add the IP and mac of the gateway.


My guess is that it's an issue with the router, although there could be something else causing the problem on your network.  

Are you familiar with how ARP works?  

If not, here's an overview:  http://www.omnisecu.com/tcpip/address-resolution-protocol-arp.php
I just got off the phone with ISP, they also think it is an ARP issue (kuddos asavener)

The adtran has a limit of 100 ARP MACs, there is a chance I am exceeding that limit. The adtran, according to the ISP, has a MAX limit of 100 and cannot be increased. However, I have a hard time believing that I am hitting that 100 max mark that often, if at all. I could see if the device held the addresses for extended periods of time, but the leases are 10 min.

Why do you think restarting the switches fixes the problem temporarily?
arp -a on down devices shows the correct IP and MAC for the gateway (I checked that last night)
ASKER CERTIFIED SOLUTION
Avatar of asavener
asavener
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you, this has resolved the issue!