?
Solved

Windows 2003 server loses network connectivity

Posted on 2011-10-05
48
Medium Priority
?
743 Views
Last Modified: 2012-06-27
Hi,

Our company uses a server (which is also a DC and the ONLY DNS), which for some reason looses network connectivity randomly throughout the day. users are disconnected from the networked drives and you cannot connect to the server via RDC.
i dont know if it is total loss of connection, but it seems that way, one thing that throws me is that it is only ever down for 30 secs or so and every time i ping the server it resolves straight away and then is connectible again, so i am not sure if Pinging it resolves the issue or if it resolves itself quickly enough that the ping works anyway.

The task manager is no help, i have been going through that and cannot find any errors that seem remotely connected to it.

any help would be much appreciated, this has been going on for months!!!

thanks
0
Comment
Question by:catomax
  • 27
  • 14
  • 6
  • +1
48 Comments
 
LVL 7

Expert Comment

by:frajico
ID: 36918521
Which brand is the server? Any network teaming software installed in case it has two network card? Is it connected directly to a switch? If you disconnect the server and ping it's IP address something else answers the ping?

0
 

Author Comment

by:catomax
ID: 36918531
It has Broadcom Advanced Control Suite controlling its 4 NIC's in a team, it has always been that way and not caused problems in the past so i did not want to change that. if the server is disconnected i will respond to my own ping.
0
 
LVL 7

Expert Comment

by:frajico
ID: 36918631
Is Power Saving feature enabled on the teaming software or in any network card properties?
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 

Author Comment

by:catomax
ID: 36918640
I will have a look... one sec!
0
 

Author Comment

by:catomax
ID: 36918673
AHA!! yes it is on on the NIC's power Management settings, shall i switch it off?
0
 
LVL 7

Expert Comment

by:frajico
ID: 36918680
Is you ping the server, is TTL value increasing or stable?

Is it possible that you're getting some kind of IP conflict between the server or any switch/router? Is IP fixed on the server and has the same network mask than DHCP server? Is your DHCP scope set to exclude the addresses that your servers and/or switches are using?

Can you try to change the IP server to another fixed IP that is not currently used on the same subnet and is not available or is excluded  for DHCP scope?
0
 
LVL 7

Expert Comment

by:frajico
ID: 36918684
Yes, turn it off for servers .....
0
 

Author Comment

by:catomax
ID: 36918688
TTL Consistently 128
IP very much static, cannot change it as it would cause massive network death!
0
 
LVL 1

Expert Comment

by:knollebolle
ID: 36919264
have you checked your switch connection ?

maybe your switch port is broke
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36920167
Power save features would have been my first guess.

Now, I would like to know what Service pack you are using on the server. There was a problem with a couple service packs.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36920184
One more thing:

How many PCs do you have on the network that you really need NIC teaming for? If less than 250, you should consider going to ONE nic. NIC teaming can periodically cause problems with communications with the switch, (especially in a managed switch).
0
 

Author Comment

by:catomax
ID: 36922826
OK, interesting point about the teaming thing, we only have about 30-40 computers, so maybe 4 NIC's is a bit overkill. you would definitely recommend going to just one?

BTW: i switched off the power saving thing last night and got a call in the morning saying the shared drive had already gone down before i got in so no luck there!
0
 

Author Comment

by:catomax
ID: 36923436
OK, so i tried to disable the TEAM and have it going through just one NIC, MASSIVE MISTAKE!!
most people lost all network connectivity with the server, what confuses me is that everyone could ping it and it could ping everyone, but conection via RDC or Network shares were gone.

i re-enabled the Team and am back to square 1
0
 

Author Comment

by:catomax
ID: 36923518
Also, it is Service Pack 2
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36924563
Your network teaming software isn't working right. Have the network team look at that server's ARP cache that compares MAC addresses to IP addresses. Here's the problem: By design, a computer with the same IP as another will have network shares disabled. Upon going to a ONE-NIC configuration, the client computers wouldn't work. They are attempting to communicate with the computer, but then the computer can't reply.

Now, the only computer I know of with four built in NICs is a Dell 710 server. If this is true, the first two nics are paired and the second two nics are paired as a team, by default and you can't break up that pairing. Then, if you are using all four, this could be the IP conflict and causing you to loose access to the shares periodically. In addition, on a managed switch, you should look up the configuration for Multicast and Unicast for managed switches. I think Technet has a good article on this.
0
 

Author Comment

by:catomax
ID: 36929572
Thanks Chief!

it is a 710 Server but when we bought it there was no teaming done at all, we put the BACS software on there and created the teams ourselves.

the main thing that confuses me is it is only windows 7 machines that completely lose connection, bar mine (no idea why). XP machines only loose connection for 10 secs or so while it is changing to single NIC but then just connect again, as where all W7 machines (bar mine) just never connect again, doesnt seem to matter whether i restart them or flush the dns on them, nothing helps.

also, i am the network team! :)
0
 

Author Comment

by:catomax
ID: 36929749
AHA!!!

I have been looking at the ARP cache (my computer), when i lost connection to the server, the ARP Cache shows the mac address of the server as being the firewall(?????)
Clearing the ARP Cache and reconnecting (via ping) does not solve the problem, only when i click 'Diagnose the problem' in windows explorer does the MAC address correct itself.

weird!
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36935494
It's funny because I had this argument with my boss on buying servers with multiple conjoined nics. I have seen to many problems with network teaming and multihoming computers. But, they went out and bought R710s anyway. My arguments were based on how netbios handles netbios traffic and prevents spoofing as a security measure. Netbios is used for the RPC locator service, the netlogon service, and the browser services. So, it is used with Network shares. The problem with netbios is a while back (to prevent from spoofing), they automatically shut down if a computer with the same name or a computer with the same IP has been detected on the network. This means a multihomed computer will shut down netbios, and possibly a teamed nic configuration could shut you down. An example of what you will see, is provided below.  

NOTE: Netbios binds to a single nic. It's the first nic that is used when it first comes up. To see what nic it binds to, you can go to the command prompt and type: "Net config redir". There you will see a MAC address that is handling your netbios traffic and you can compare it to an "IPconfig /all"

When you break out in a teaming environment, or route over the server, netbios has problems with that because of the anti spoofing that is built into the netbios security. Routing over the server, you have to disable netbios on the outside nic. When teaming, you have to ignore the bind order. That is set by registry hack, as seen below.

How to ignore bind order:
http://support.microsoft.com/kb/166159

Here's an example of multiple computers with the same IP: In this case, it was a networked printer that shared the same IP as the server causing intermittent network connectivity to the shares (as in your case)...

http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Windows/Windows_7/Q_27105115.html?sfQueryTermInfo=1+10+30+chiefit+red+share+x

Now, with that link, it also recommends disabling autodisconnect and power save features. This is something I recommend on W7 computers. That autodisconnect after idle time is standard behavior in Windows 7 - as a workaround you can turn that off with the following command:

net config server /autodisconnect:-1


You are on a Dell R710. What that means is you have TWO pair of conjoined nics. In other words, the first two are teamed, and the second two are teamed automatically. By using teaming software and joining all four, you are actually joining the two sets and I don't know if this was done right. Remember when I told you to go with ONE nic. In your case, i would go with TWO (the first two) because of how they are conjoined and teamed. But, be aware of what NIC you are bound two via netbios by looking at the Net config redir. It might be the second two.

Suggestions:
1-Break the team and go with two nics
2-Stop Win7 autodisconnect and make sure there are no power settings
3-Reg hack to ignore bind order

--------------------------------------------------------------
The next suggestion is just a warning because I don't think this is your problem:
Netbios is often blocked by firewalls because it was so highly targeted. Many software firewalls are actually "System State" firewalls. This means, if the traffic was not orginated on this machine, it will NOT accept the traffic. Well Netbios IS a broadcast protocol. This means that computers broadcast netbios resolution, and that is picked up by the Netbios Name Server (much like a DNS server). However the NBNS is a set of Cached records. So, if the NBNS doesn't except broadcasts, nor do the clients, then you will not see computers in "My Network Places" (Network on a W7 computer). There is ONE exception to this. You may see them for about the first 10-15 minutes of logging on. This is because that system state firewall allows a netbios broadcast BEFORE the firewall services are started. This happens at a splash screen when you see the "checking Network connections" upon boot up.

Suggestions:
4-Enable a firewall exception to all software client firewalls for "File and Print Sharing" or Netbios
5- Also give yourself an exception for ICMP echo (ping).

Let's see how these suggestions pan out and get back with me for further troubleshooting. I also think you may have a problem with W7 computers wanting to use SMB2 and not SMB1. But, Let's not go there now.
0
 

Author Comment

by:catomax
ID: 36942530
Ok, so here's what we did over the weekend (thanks for the huge response BTW, much appreciated)

i removed the Teaming and went to just one NIC (before reading your post), on sunday night i added a static route to all win7 machines "netsh interface ipv4 add neighbors "Local Area Connection" [Server Name] [mac address]"

this seems to do the trick with the win7 machines but having some other trouble now (like Blackberry not able to activate)
i ran the Net Config Redir command and strangely it came back with:
NetbiosSmb (000000000000)
then 4 different entries for the same MAC address, i have run the same command on other servers and they just come back with one MAC for one NETBT address.

does this mean anything? can i change the settings here? it confuses me greatly!

thanks.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36945913
Please read:

http://www.petri.co.il/how-to-disable-smb-2-on-windows-vista-or-server-2008.htm

Usually blackberrys used a scaled down version of XP, meaning they use SMB 1.0. I wonder if your computers are having problems with SMB 2.0.

To remove additional bindings: (I SERIOUSLY DON'T RECOMMEND IT, if you do backup the registry)

Path:   HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Linkage

Registry value:         Bind

Cannot Change the Binding Order for Remote Access Connections

A less invasive approach:
Go to the run line and type:
NCPA.CPL
Right click on the adapter you want to change and select Advanced settings>
Put your nic as the top of the bind order.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36945918
Here is a Microsoft article on Bind order: (I should apply to a 2003 server, much like an XP puter).

http://support.microsoft.com/kb/894564
0
 

Author Comment

by:catomax
ID: 36947861
HI Chief,

I managed to stumble across that linkage registry inadvertantly and basically cleaned it all up to just contain the info from the one NIC. Now the result of the Net Config Redir is correct but the MAC address being published by the server seems to still be wrong (ie, when i check the ARP cache of a computer that has lost connection to the network share, the MAC is the MAC address of the firewall, not the Server's NIC)

this is confusing me greatly, i keep finding new information and you have provided me with loads (thank you) but the problem just keeps recurring, maybe the NIC thing is a red herring and the problem is elsewhere in the setup, i just know that it is the ARP table that is causing the disconnection so why is the server giving out the Firewall's MAC?

i tried with the SMB to no avail.
0
 
LVL 7

Expert Comment

by:frajico
ID: 36948051
Are the server and the Pcs on diferent networks and the firewall is between them? --> If yes, on the PCs the MAC address of the "server" should be the Firewall MAC, because it's doing ARP Proxy, like a Router ...

If the are no in diferent VLANs and the Firewall is NO between the server and the PCs, you should check the Firewalls for a Interface misconfiguration (same IP asigned than server IP or Proxy ARP configuration). Is Firewall serving DHCP?

Is Proxy ARP enabled on the Firewall?
0
 

Author Comment

by:catomax
ID: 36948083
Hi Frajico,

the Firewall is configured correctly as far as i am aware, we spent a long time planning the IPs so that none would be similar.
the server and computers are on the same network and there is no firewall between them.

i dont know about Proxy ARP, i will check but it is not something i have enabled myself.
0
 
LVL 7

Expert Comment

by:frajico
ID: 36949592
Is the default gateway IP the same in the server and in the Pc's IP configuration? Is the firewall the default gateway of the network? Is there any static routes in the server or in the PCs?
0
 

Author Comment

by:catomax
ID: 36949610
the default gateway is correct on the server, there are static routes around the place but none that refer to the server or the firewall.
the firewall is DG
0
 

Author Comment

by:catomax
ID: 36954682
OK, this has progressed slightly, i can safely say this server has connectivity issues, but i cannot say what or why.
Outlook is now displaying a message on a users machine that says

"Network Problems are preventing connection to Exchange"

and on the mail server's event viewer i can see countless errors like this:

"Process MsFTEFD.exe (PID=17460).  Exchange Active Directory Provider lost contact with domain controller .  Error was 0x51 (ServerDown) (Active directory response: The LDAP server is unavailable.).  Exchange Active Directory Provider will attempt to reconnect with this domain controller when it is reachable.  

but there is nothing stating this in the problematic servers event viewer, there is nothing that hints at any errors. it is just like other servers cannot connect to really important services on it such as GC and DC, but DNS seems to function fine.

This is becoming very difficult to figure out! :)
0
 

Author Comment

by:catomax
ID: 36954721
i have just ran ARP -a on my machine and it is clear that on ALL Win7 machines, the main server is somehow listing its MAC address as the SAME address as our firewall. it is not the case with XP machines, they somehow manage to pick up the right MAC address, this cannot be right. i have no idea why it is doing this, i have read all the posts but the main problem i am having is understanding what caused it in the first place, as far as i remember nothing has been changed (and i am the only one who would have changed it!) but these network problems just seem to have started occurring a couple months ago with no reason and it seems every fix i have tried has just made the problem worse.

what was happening before was intermittent network/share loss to the server from Win7 machines, then after:
- removing teaming
- switching to just one NIC
- removing settings in Registry from linkage, routing and bind tables
- changing bind order in NCPA.CPL

we now have NO access to this server from Win7 machines as the MAC is incorrect. it seems XP is unaffected but as the mail server is running Server 2008 it is having problems contacting the main server too.
any help?
i seem to have lots of bits of data and yet no real understanding of what is happening :)
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36956669
Why would you say that the server is registering istelf as the same MAC as the firewall. The only way this would happen is if the server has the same IP as the firwall. These ARP messages are sent out via a broadcast and the MAC address would of course be different. Yes, if your server shares an IP with another node... You will have problems.

In addition to Allowing Netbios, Win 7 must have additional configuration to allow file/print sharing and netbios. Go into your advanced Network connection settings within Network manager and allow netbios.

OPEN: Network connection and sharing Center>>SELECT: Change Advanced Sharing Settings

It stands to reason, the reason you are having problems with WIN 7 is because of special security features of WIN 7. If XP clients are cool, then WIN 7 has an additional security feature that XP does not.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36956688
For domain server diagnostics, I encourage you to download the 2003 server support tools and run these three commands at the command prompt:

DCdiag /v      
DCdiag /test:DNS
Netdiag /V

Look for errors and notifiy us with these errors.
0
 

Author Comment

by:catomax
ID: 36960605
ok, i will run these but the reason i said the MAC address is being broadcast wrong is because it is, all W7 machines without the Static ARP route to the server (entered manually using the netshell command at the bottom) display the Firewalls MAC address next to the Servers IP address when you run ARP -a on the machine.

i know it sounds bizarre but it is the case, our firewall is 10.10.0.1, and the server is 10.10.0.60.
- On an XP machine running ARP -a  will show 10.10.0.60 [Correct MAC address]
- On W7 running ARP -a will show 10.10.0.60 [Same MAC as the 10.10.0.1]

what confuses me most is that these W7 machines are not new, they have been around for a while but the problems only started occurring a couple months ago,maybe it was a windows update but i cannot find any other people having the same issue so most likely it is a fault with our setup, but very tricky to troubleshoot as i dont know what changed.

i will run those commands and post the results in the next post,
thanks Chief.
0
 

Author Comment

by:catomax
ID: 36960638
DCDIAG -V: passes everything except the following:
 Starting test: systemlog
    * The System Event log test
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:50
       (Event String could not be retrieved)
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:50
       (Event String could not be retrieved)
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:50
       (Event String could not be retrieved)
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:51
       (Event String could not be retrieved)
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:51
       (Event String could not be retrieved)
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:51
       (Event String could not be retrieved)
    An Error Event occured.  EventID: 0x00000457
       Time Generated: 10/13/2011   08:58:51
       (Event String could not be retrieved)
    ......................... [SERVER] failed test systemlog
0
 

Author Comment

by:catomax
ID: 36960646
DCDIAG /test:dns:
Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\[Server]
      Starting test: Connectivity
         ......................... [Server] passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\[Server]

DNS Tests are running and not hung. Please wait a few minutes...

   Running partition tests on : ForestDnsZones

   Running partition tests on : DomainDnsZones

   Running partition tests on : Schema

   Running partition tests on : Configuration

   Running partition tests on : [domain]

   Running enterprise tests on : [domain].local
      Starting test: DNS
         Test results for domain controllers:

            DC: [Server].[domain].local
            Domain: [domain].local


               TEST: Forwarders/Root hints (Forw)
                  Error: Root hints list has invalid root hint server: a.root-se
rvers.net. (198.41.0.4)
                  Error: Root hints list has invalid root hint server: b.root-se
rvers.net. (192.228.79.201)
                  Error: Root hints list has invalid root hint server: c.root-se
rvers.net. (192.33.4.12)
                  Error: Root hints list has invalid root hint server: d.root-se
rvers.net. (128.8.10.90)
                  Error: Root hints list has invalid root hint server: e.root-se
rvers.net. (192.203.230.10)
                  Error: Root hints list has invalid root hint server: f.root-se
rvers.net. (192.5.5.241)
                  Error: Root hints list has invalid root hint server: g.root-se
rvers.net. (192.112.36.4)
                  Error: Root hints list has invalid root hint server: h.root-se
rvers.net. (128.63.2.53)
                  Error: Root hints list has invalid root hint server: i.root-se
rvers.net. (192.36.148.17)
                  Error: Root hints list has invalid root hint server: j.root-se
rvers.net. (192.58.128.30)
                  Error: Root hints list has invalid root hint server: k.root-se
rvers.net. (193.0.14.129)
                  Error: Root hints list has invalid root hint server: l.root-se
rvers.net. (199.7.83.42)
                  Error: Root hints list has invalid root hint server: m.root-se
rvers.net. (202.12.27.33)

         Summary of test results for DNS servers used by the above domain contro
llers:

            DNS server: 128.63.2.53 (h.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 128.63.2.53

            DNS server: 128.8.10.90 (d.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 128.8.10.90

            DNS server: 192.112.36.4 (g.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.112.36.4

            DNS server: 192.203.230.10 (e.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.203.230.10

            DNS server: 192.228.79.201 (b.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.228.79.201

            DNS server: 192.33.4.12 (c.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.33.4.12

            DNS server: 192.36.148.17 (i.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.36.148.17

            DNS server: 192.5.5.241 (f.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.5.5.241

            DNS server: 192.58.128.30 (j.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 192.58.128.30

            DNS server: 193.0.14.129 (k.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 193.0.14.129

            DNS server: 198.41.0.4 (a.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 198.41.0.4

            DNS server: 199.7.83.42 (l.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 199.7.83.42

            DNS server: 202.12.27.33 (m.root-servers.net.)
               1 test failure on this DNS server
               This is not a valid DNS server. PTR record query for the 1.0.0.12
7.in-addr.arpa. failed on the DNS server 202.12.27.33

         Summary of DNS test results:

                                            Auth Basc Forw Del  Dyn  RReg Ext
               ________________________________________________________________
            Domain: [domain].local
               [Server]                        PASS PASS FAIL PASS PASS PASS n/a

         ......................... [domain].local failed test DNS
0
 

Author Comment

by:catomax
ID: 36960682
netdia /v comes up with too much info to even see the whole thing, but i notice this at the beginning:

    Testing DNS
    [FATAL] Could not open file C:\WINDOWS\system32\config\netlogon.dns for read
ing.
        [FATAL] No DNS servers have the DNS records for this DC registered.
    Testing redirector and browser... Failed
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 36965814
You do have a few problems within DNS:

Forwarders Versus Root Hints:

It appears you are using root hints. Under the DNS snapin>>right click a server >>Forwarders option>>Forwarders tab are you listing a couple outside DNS servers within forwarders?? If so, Is recursion disabled?  If recursive lookups is disabled you will default to root hints servers, (that no longer exist under that IP address)..

---------------------------------------------------

Net Diag Error:

Looks like there is something wrong with your SRV records within DNS on the server, or something wrong with Netlogons.

As a test please try this command: DCdiag /test:Netlogons

and as a potential fix, type this command: Netdiag /fix

----ALSO: Check your DNS snapin to see if you have ANY greyed out ...MSDCS foders within your forward lookup zone.

______________________________________________

With that said, I am wondering if you are hostin IPv6. IPv6 is a tunneling protocol that will REQUIRE the router to be the middle man for all communications. The router must host IPv6. Your ARP cache may be a wee confusing for that reason. It knows that in order to get to the server's MAC address, it must go through the router on a IPv6 conneciton. THAT DOES NOT MEAN IT WILL DO SO.

For enhanced performance>>> If your router and LAN is not explicitly configured to support IPv6, then I would disable it on the W7 computers. It is enabled by default. This will only improve performance because the computer will not go to the binding and try anything if it's disabled.

To host IPv6 on a LAN, you must configure DHCP, DNS, and your router to support it. It's a lot of configurations management.

Bottom line: I am not convinced your ARP table is jacked on all these computers and misleading W7 computers. It sounds to me more like a Security setting on all W7 machines, OR they are using SMB v1 or SMB v2 incorrectly. These DNS errors may also be a problem, but should be a problem for the entire domain.

 
0
 

Author Comment

by:catomax
ID: 36984429
Ok, i have researched what you have written, it seems very helpful but the same problem still remains, none of the computers experiencing the problem are using IPV6, SMB has been changed to 1.0 on the other DC (server 2008). NIC drivers updated, registry changed, but still same problem, no static ARP path means W7 machines cannot connect to this computer properly.

stumped is the word!
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 37036193
OK, Just found out a bit about Windows Connection sharing features. A Win 7 computer, (by default), I believe has Windows Connection sharing. The question is, will these computers contact that one computer they are sharing a connection with

The idea behind a Windows Connection Sharing is it uses PORT ADDRESS TRANSLATION (PAT). So, the computers use a common IP and the ports are translated to create the WIN Sock. This could explain the MAC and messed up ARP table.

Another thing that could explain it is ARP poisoning. But, that shouldn't only effect the W7 clients. ARP broadcasts are sent between routers/switches through a Neighbor discovery Protocol (NDP)... ARP poisoning can cause all kinds of havoc.

Check and make sure all W7 clients do NOT use internet connection sharing. Now, this may knock of some computers, all they have to do after disabling ICS, is renew the IP and the DHCP server should allow it.

Another thing that can prevent clients from accessing the internet is a Blue Coat Proxy. This ensures that clients are up to date BEFORE they are allowed on the internet. It goes in and evaluates computers for Service packs, patches, AV defintions, etc... XP computers, (if imaged) may pass the proxy, while the W7 computers are not completetly up to date. Blue Coat will only allow sites that are update sites to update the software in order to be within compliance.
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 37036196
By the way, if your network enterprise hosts updates and AV repositories on site, you may not be able to go to the update sites. Instead, you should be able to contact the update servers directly on the site only. It depends upon how configured...
0
 

Author Comment

by:catomax
ID: 37036554
That is interesting about the connection sharing, i will have a look into this now and see what i can find, I dont think the Blue Coat Proxy applies for 2 reasons:

1 - i have never heard of that before :)
2 - i set them all up! :)

i will have a look into it though, I am not sure what you mean by AV repositories, we use ESET NOD32 based on a server and that updates automatically and filters it out to all the computers.

Thanks again Cheif!!
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 37038591
To clarify a few thoughts:

Blue Coat was just an example of what's called Network Access Control (NAC). NAC features are showing up on more and more AV software and other security software. McAfee is an example of having a NAC addon.

A repository is simply a database with updates that you can download for the software package. Microsoft has update servers, but they have a repository of updates to choose from.

I still think Network Access Control may be an issue...
0
 

Author Comment

by:catomax
ID: 37043413
Ok, thanks,

I just had the thought, we have ESET NOD32 but i dont think it is that as i set up a guys W7 laptop here (just had McAfee basic edition) and that experienced the same problems.

i will look into it though and just see what the effects are.

thanks.
0
 

Author Comment

by:catomax
ID: 37291356
Hi Chief,

sorry for the extended radio silence, i will of course be awarding you the points for this as you have been very helpful.
But once again i find myself banging my head against the wall, there doesn't seem to be any complicated technical reason for this to be happening, which makes me think it must be something very UNCOMPLICATED!
i am thinking, this has not always been a problem, it is only in the last 6 months that this has started causing problems (I think), and as such i think it must be an option or setting somewhere that has been disabled/enabled.

it ONLY effects windows 7 machines, and it is now comletely unusable with them, the only way to use a windows 7 machine in our domain is to run:

"netsh interface ipv4 add neighbors "Local Area Connection" 10.10.0.60 00-xx-xx-xx-xx-d5"

this adds the correct MAC into the Arp table and works fine from then on, but if you dont add that entry, as soon as the user ties to connect to 10.10.0.60 (it is DC and App server) it fails and the ARP record shows the MAC address of our Firewall!

is there any setting that might have been switched on/off you can think of, as you know we have been through IPV4/6, NIC Teaming, well its all above!

any help from anyone would be great!
0
 
LVL 39

Expert Comment

by:ChiefIT
ID: 37295804
Cisco has a proprietary neighbor discovery process that is sent between switches. Being proprietary, I don't know how well this will go over with Win machines to date..

http://en.wikipedia.org/wiki/Cisco_Discovery_Protocol

If Vista or later rely upon Neighbor cache, I don't see how this will work unless these nics are compatible with CDP (Cisco Discovery Protocol) neighbors. The below link tells you how Microsoft deals with ARP cache and neighbors post vista machines.

Now, remember, you are putting static arp entries within the neighbor cache to be seen. Maybe going back to the beginning state and then lengthening the basereachable time and/or neighbor cache limit time will solve your problems as outlined on this article. Use a test machine with no static neighbor entries.

http://support.microsoft.com/kb/949589
0
 

Author Comment

by:catomax
ID: 37296810
Thanks Cheif,

i was just reading the Wiki article and saw this: "Hewlett-Packard removed support for transmitting CDP from HP Procurve products shipped after February 2006 and all future software upgrades"

all our switches are HP procurve, and i am pretty sure purchased after 2006, also, i think i have tried that Baseresearchable time before as it looks very familiar, but i will give it a go!

thanks.
0
 

Author Comment

by:catomax
ID: 37296883
Yeah, unfortunately no dice, same reaction, as soon as i remove the static ARP for 10.10.0.60 it becomes uncontactable, it then adds itself to the ARP table dynamically with the Firewalls MAC. tried changing the basereachable time and same result.

I wonder if the Firewall is actually some part of this, i just have no idea why it would be doing it now.
what i am trying to get my head around is a feature that i could have enabled/disabled that would have this effect, for instance i have never tried to change the basereachable time before this problem, so why would it suddenly start causing a problem, could IIS have some part in this, i feel in the back of my mind that an IIS setting was changed months ago that maybe preceded this, but i could not be sure. the Firewall is updated quite regularly so i could not pinpoint what was changed to cause the problem.
0
 
LVL 39

Accepted Solution

by:
ChiefIT earned 2000 total points
ID: 37298687
IIS is out of the equation. You verified that your arp cache is not updating, and therefore, you are not able to communicate. You also verfied a work around by adding a static ARP entry.

IIS is a Web server application protocol. (LAYER 7)... Your problem is within the datalink layer (layer 2)

What I would do is logon to a switch and check the arp entries there. If those exist dynamically,, THEN you have a solo problem with W7 computers accepting ARP broadcasts using your switches.

NOW, there are problems with a Network intrusion detection system that could be incorporated on a firewall. Firewalls can dynamically shut down some services IF and ONLY IF, you have an ACTIVE (not Passive) NIDS monitoring for out of the norm traffic. If you get excessive traffic and NIDS could tell systems to BLOCK it.

STICK with Layer 2 troubleshooting. This particular incident may require a call first to your switch/router manufacturer (HP), then to Microsoft. AND you should tell them that the ARP neighbors are not populating on W7 computers. I'll be willing to bet that they will know of this issue of incompatibility, (if not, being two large companies that need compatibility with each other), they will find a fix. I can't emphasise this enough.. STICK WITH LAYER 2 (data link layer) troubleshooting and fixes.
0
 

Author Comment

by:catomax
ID: 37298713
Cool,
thanks Chief, that's very helpful i will get onto the switches then MS and HP and see what i can find!
glad you cleared that one up too (IIS)!

thanks.
0
 

Author Comment

by:catomax
ID: 37324984
OK, HP say the Switches i have DO NOT support dynamic ARP entries, and the ARP tables CANNOT be edited, so that is them out of the question.

Microsoft want to charge £199.99 +VAT for a phone call which is robbery.

Just phoning Sonicwall now to see what they say about NIDS.
0

Featured Post

A Cyber Security RX to Protect Your Organization

Join us on December 13th for a webinar to learn how medical providers can defend against malware with a cyber security "Rx" that supports a healthy technology adoption plan for every healthcare organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A quick step-by-step overview of installing and configuring Carbonite Server Backup.
In this article, we’ll look at how to deploy ProxySQL.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question