Link to home
Create AccountLog in
Avatar of joelbav
joelbavFlag for United States of America

asked on

XP clients that have been joined to the Server2k3 domain are sluggish when connected to other networks

Hi, all...

Our network once consisted of NT4 Servers with Win98SE, Win2k and XPPro clients. All was well in both sites (main and remote buildings, joined by a rented T1.)

When we switched over to Server2k3 (do I HAVE to admit how long ago that was?!) we removed all Win98SE clients from the LAN & disjoined the Win2k/XP clients from the NT domain. The NT servers were replaced by new machines running Server2k3 Standard. Clients had their static IPs replaced with DHCP and were joined to the new domain. The primary server handles DHCP and DNS.

All was well in the main building (where the servers live) but two of the four XP machines in the remote building (on the side of the T1 opposite of the servers) were very sluggish as soon as they joined the Win2K3 domain. Dropping their NICs to 10/100 (they had been set to Gigabit and interoperated just fine in the NT4 domain) took care of most of the issues.

Note that no Win2K machines have displayed any problematic behavior.

The persisting problem is this: An XP client joined to the domain will work beautifully while on the LAN. If removed from the LAN (for example, a laptop taken from the LAN and used in the parking lot) also works fine. But if the machine is allowed to touch another LAN, it may run very slow until it is removed from the non-company LAN. (No rhyme or reason as to which machines will/won't misbehave: we have two identical machines deployed from a common image. One works fine on other LANs, the other doesn't.)

Once an affected client is introduced to a network other than the LAN that our domain lives on (the strange LAN wired, Wi-Fi or wireless broadband) the machine becomes very sluggish. Although the mouse moves as expected and CTRL-ALT-DELETE will bring up the normal menu, clicks with the mouse or requests for Task Manager may take literally a minute or more to respond. Windows Explorer will be sluggish. A web paged accessed by Internet Explorer will take long enough to load that sometimes the (Not Responding) message will show up in the windows title. During this time, Task Manager will usually show the System Idle Process at 98% or more, without anything else seeming to monopolize the cpu time. Also, there is no large amount of data going through the NIC or switch. One would think the machine was perfectly happy.

Wait for a couple minutes and the page will usually finish loading. A minute or so later, the stubbornness returns.

Remove the non-domain LANs connection, and within seconds the client will return to normal operation.

I characterize this as the client waiting for something to happen via the non-domain LAN. Once it gives up or the non-domain LAN goes away, the client goes back to normal (tho network-free) operation.

What other details can I give ya to start with? When connected to the company LAN, the machines seem to get the full functionality of DHCP and DNS, however I have added lines to the HOSTS file to make sure each servers static IP is unquestioned. Ive confirmed that this behavior happens without the installation of anti-virus and with group policies off (when we did our migration and the issue first surfaced, there were no policies set yet.)

Sure would like to solve this issue, since a salesman cant be asked to use his brand new laptop in the office but have to rely on the beaten-up Win2K laptop on the road (especially when that salesman also happens to be my boss boss!)

Thanks in advance for the braintime on this&!

:)
Avatar of dhoffman_98
dhoffman_98
Flag of United States of America image

One thing that you didn't mention was the use of group policies. You said that you could have two identical machines with identical images. That doesn't mean that the same policy would automatically apply to them both since policies could be filtered based on what OU or group the machines might be a member of.

Are you using any policies that force specific network settings that might not be applied to one machine or the other?
Avatar of glebn
glebn

"I characterize this as the client waiting for something to happen via the non-domain LAN. Once it gives up or the non-domain LAN goes away, the client goes back to normal (tho network-free) operation."

Yes, this must be what's happening. My guess is that your XP clients are looking for a domain resource, first attempting to find what they need by querying DNS, when the DNS lookup fails (because the foreign LAN's DNS knows nothing about your LANs resources) they go to NetBIOS broadcasts to find the resource. The delays are caused by waiting for the lookups to timeout. When there is no network connection everything is fine because the XP client knows not to bother looking hence there is nothing to timeout. Exactly what your XP clients which are experiencing this problem are looking for is hard to say.

My suggestion is to setup a test network and replicate the problem, then try disabling NetBIOS in the TCP/IP (advanced properties, WINS tab) on the XP clients that don't work. This will prevent your XP broadcast from sending broadcasts looking for domain resources. In your pure Win2K3/XP network you should not need NetBIOS so it might be an acceptable solution to leave this disabled.

If you want to get to the bottom of what exactly your XP client is looking for, I again would setup a test network to replicate the problem, and then install Wireshark (http://www.wireshark.org/) and sniff the traffic to see if you can tell what your XP client is looking for.
Avatar of joelbav

ASKER

Dhoffman_98: I hadnt thought of the OU angle, but we had only the default groups for the machines and users when we first deployed the new domain. Everything was as out of the box as possible to start with. Since then, our people are either in the default unit or the No LDAP listing OU (and the second group doesnt get to touch the laptops.) I think were safe there. Thanks for the great thoughts tho. Any more? The next one may be the one (I sure hope!)

Glebn: Ive been wondering if there was some deeper issue in DNS that might be persisting in the clients even tho another DHCP server on another network has stepped in on a non-domain LAN. Sadly, thats beyond my knowledge and WAY beyond the time I have to dedicate to this issue (I can take as many weeks as I need to fix it, as long as its done by Sunday night.)

To remove all the excuses I can think of for the clients to want the domain server, Ive removed the network printer mappings (tho I havent gone into the registry and made sure everything was removed&thats probably worth it for the sake of being thorough), confirmed that there are no persistent drive mappings, experimented with switching NetBIOS off/on/default, killed the DNS Client Service and a few other items that Im too tired to remember at this point (the notes are at work, of course.)

I cant set up a test domain, but am doing a clean install of XP right now on the machine that brought all this to a head. Will give it the minimum install of stuff and see if it breaks once its joined to the domain. The Wireshark idea is a good one. Process Monitor didnt reveal anything meaningful to me, so thats a good next move.

Thanks both for your great suggestions. Anything else?

:)
Ummmmmm, that's a head scratcher.

One misunderstanding, when I referred  to setting up a test network I didn't mean to imply setting up a whole domain. I just meant to simulate the scenario of a laptop connected off campus. I do this with a simple broadband router with DHCP enabled. Of course DNS servers have to point to Internet DNS servers, etc. etc.

Reading your reply above you've done all the obvious things such as remove persistent mappings, etc. As you said, you could spend days turning off services, killing processes, etc. trying to isolate the cause of the problem. If it were me, I would give Wireshark a few hours worth of time. Of course looking at packet captures can be a blackhole for time too, but I think it is worth a couple of hours. While I certainly think it is a Windows problem, I've chased enough red herrings to be cautious. Who knows, it could be some crapware that got installed which Wireshark would probably reveal pretty easily.

Two other SysInternal tools you might find helfpul: TCPView & AutoRuns. Since you've tried Process Monitor you probably already have tried these, but I list them in case you haven't. TCPView for looking at network connections and AuotRuns for troubleshooting by easily disabling/enabling autorunning processes.

When you solve the problem I would appreciate it if you post a short note here even if you never get to the bottom of the problem--e.g. a fresh XP image solves the problem but you never determine the root cause. I'm both very curious and I've found that problems like these have a way of showing up on my network!
Avatar of joelbav

ASKER

Ok...using TCPView, I see that when the system gets sluggish seems to match up with when I see traffic on port 445. A typical example would be:

Process: System:4
Protocol: TCP
Local Address: (MachineName):1072
Remote Address: (DomainServerName):445
State: syn_sent

While this is displayed, the machine remains sluggish. Once it goes away (whether it does so on it's own or I end the process) things work nicely.

I confirmed that NetBIOS is turned off on each interface. I've seen it mentioned (Google is our friend, BTW!) that SMB functions over port 445 regardless of the state of NetBIOS. Am not sure if trying to kill off SMB is the right way to go, but maybe it will shed some light, if nothing else...

BTW: TCPView is one of a million tools that I've never stuck my nose into, until now. That seems to have been a big help! Thanks for the recommendation...!


:)
Avatar of joelbav

ASKER

Ok...using TCPView, I see that when the system gets sluggish seems to match up with when I see traffic on port 445. A typical example would be:

Process: System:4
Protocol: TCP
Local Address: (MachineName):1072
Remote Address: (DomainServerName):445
State: syn_sent

While this is displayed, the machine remains sluggish. Once it goes away (whether it does so on it's own or I end the process) things work nicely.

I confirmed that NetBIOS is turned off on each interface. I've seen it mentioned (Google is our friend, BTW!) that SMB functions over port 445 regardless of the state of NetBIOS. Am not sure if trying to kill off SMB is the right way to go, but maybe it will shed some light, if nothing else...

BTW: TCPView is one of a million tools that I've never stuck my nose into, until now. That seems to have been a big help! Thanks for the recommendation...!


:)
Now that you know what to look for, Wireshark should be very helpful. Sniff for a while and then create a filter to look at the 445 traffic. You'll see what exactly your clients are looking for.

Does your Windows domain exist on the Internet? In other words, if your windows domain is yourcompany.com, if I did an nslookup on the Internet would I be able to find an authoritative DNS server for yourcompany.com with a real IP? If so, is your client trying to connect to this authoritative server on 445?
Avatar of joelbav

ASKER

Ok...using TCPView, I see that when the system gets sluggish seems to match up with when I see traffic on port 445. A typical example would be:

Process: System:4
Protocol: TCP
Local Address: (MachineName):1072
Remote Address: (DomainServerName):445
State: syn_sent

While this is displayed, the machine remains sluggish. Once it goes away (whether it does so on it's own or I end the process) things work nicely.

I confirmed that NetBIOS is turned off on each interface. I've seen it mentioned (Google is our friend, BTW!) that SMB functions over port 445 regardless of the state of NetBIOS. Am not sure if trying to kill off SMB is the right way to go, but maybe it will shed some light, if nothing else...

BTW: TCPView is one of a million tools that I've never stuck my nose into, until now. That seems to have been a big help! Thanks for the recommendation...!


:)
Avatar of joelbav

ASKER

Ok...using TCPView, I see that when the system gets sluggish seems to match up with when I see traffic on port 445. A typical example would be:

Process: System:4
Protocol: TCP
Local Address: (MachineName):1072
Remote Address: (DomainServerName):445
State: syn_sent

While this is displayed, the machine remains sluggish. Once it goes away (whether it does so on it's own or I end the process) things work nicely.

I confirmed that NetBIOS is turned off on each interface. I've seen it mentioned (Google is our friend, BTW!) that SMB functions over port 445 regardless of the state of NetBIOS. Am not sure if trying to kill off SMB is the right way to go, but maybe it will shed some light, if nothing else...

BTW: TCPView is one of a million tools that I've never stuck my nose into, until now. That seems to have been a big help! Thanks for the recommendation...!


:)
Avatar of joelbav

ASKER

Well, now I'm back to being too confused again! Wireshark shows me that even tho the machine is rebooted off the company LAN, IE traffic starts out with a call to one of the domain server's static IPs. An example:

Source: (ClientIPAddressViaDHCP)
Destination: (DomainController)
Protocol: TCP
Info: olsv > microsoft-ds [SYN] Seq=0 Win=64240 Len=0 MSS=1460

The source is the IP of the client machine, which is set by DHCP from the ISP. The destination is the static IP of a server on our domain. The IP is the correct, private address space value for the server. Turning on the DNS Client service, doing a IPCONFIG /flushdns seems to make no difference.

Perhaps this is what I should normally see? Just seems strange that a machine that hasn't seen the domain since reboot began would still be insisting to go there.

Also, killing SMB had the nasty side effect of killing access to sharepoints such as networked drives and \domainserver\NETLOGON, so that isn't going to fly.

:)
Avatar of joelbav

ASKER

Glebn: Sorry, but somehow I missed your comment above, and then got a message double-posted. Not gonna try to figure any of that out now... :)

Our Windows domain does not exist on the Internet. We don't host Internet email, provide web or ftp service to the outside world, etc. (Our company's web presence is hosted for us professionally, so I don't have Exchange to worry about. Life is good that way!)

My company's domain on the LAN (radiant.local) will not be found thru nslookup on the Internet.

I've noted that when a domain-aware machine is shut down and later restarted on a LAN other than the company one, and then IE is started, Wireshark shows that the machine is trying to contact one of our domain servers. It tries to resolve the name of the machine that ISN'T our DNS server (I'm assuming that choice is because it wants to talk to the server it got it's last dose of group policies or a previous Kerberos ticket from? Just a wild guess...) Clearly the client considers this machine an authority in some regard, tho I don't know why, as it's not pretending to be a PDC.

The attempt to contact is made using the FQDN (server02.radiant.local). The HOSTS file already had an entry for the friendly name (server02) in the 192.168.x.x address space. I've added a line for the FQDN. With that, instead of trying to resolve the FQDN, the client tries to PING the IP of the server in question. It tried three times, during which time I get the sluggish symptoms. Once it gives up after the third PING, then the client starts to respond properly.

I've done enough messing about with services on this machine that I can no longer know for sure that it represents the same client configuration that I had problems with. Tonight I'll be putting a clean XP client install (domain aware but lacking all our usual software) into testing and make sure the symptoms are the same.

BTW: I did find that cutting that big HOSTS file (courtesy of Spybot S&D) down from over 320K to about 7K did make a bit of a difference, tho it is at best only part of a bigger solution.

Thanks again for your time. It's a big help when others share their knowledge and gut feelings!

:)
ASKER CERTIFIED SOLUTION
Avatar of glebn
glebn

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of joelbav

ASKER

Glebn,

I think were golden now. Heres a summary of the problem and resolution for posterity and anyone else who might encounter this issue someday (assuming ya read the last chapter of the novel first!):

When we first switched to Server2k3, the Gigabit-capable clients in the remote building  had problems dealing with the lag introduced by the T1 that connects the remote building to our main site, which houses all servers. Turns out this is a known MS issue, and the solution was to force those particular machines down to 100Mbps (and if I recall right, turn off the allow users to logon before XP networking is stable feature, but dont take that to the bank.) Also, I added the IPs of our main building servers to the HOSTS file, in hopes that it would take some stress off of the authentication-over-T1/timing issues.

Overall, things have worked just fine on the wired LAN since those days, and Ive seen no problem with the HOSTS entries for the servers, so its just been a part of our environment on all clients.

Now, we have a road warrior who, when not connected via the wired LAN to our domain will be using the wireless broadband card in his spiffy new laptop. When testing that connection prior to deploying, the machine seemed sluggish.

Using Wireshark, I saw that the client would stumble when it wanted to talk to one of our servers. Removing the IP for that server from the HOSTS file seems to have taken care of the issue, as an attempt to look up the server is made, fails, and the client moves on (instead of stubbornly trying to talk to the unavailable server, as it would when it was given an IP to reference.)

From that standpoint, it would appear that the problem is resolved (I wont say fixed, since that implies an absolute state that NO sober man, woman, child or farm animal should EVER express in this line of work&!)

I dont understand why the client attempts to contact the server in the first place, since the server in question is not our DNS server, doesnt handle DHCP, and is generally not the head honcho of servers on the network. But Im going to assume thats just a hole in my knowledge about the finer workings of MS networking between domain controllers, DNS, LDAP and so forth.

Regarding Spybot S&D, I can understand your feelings about wanting the HOST file to be empty, but we do have to depend on it. Spybot offers the option of poking into the HOSTS file a listing of all sorts of places that we dont want a browser to touch, to which I add such things as CWS sites, the latest Storm Worm sites that come and go, bandwidth/productivity drains like YouTube/Myspace, etc. Our perimeter device doesnt allow for blocking URLs, so this is the next best thing. Plus it has the happy advantage of protecting against access when a road warrior is beyond the limits of our LAN. Thats pretty much reason enough to use it.

Anyway, what you suggested about Wireshark was critical to reaching this point, so I will award the points and my gratitude for having helped a stranger. Thanks, my friend&!
Avatar of joelbav

ASKER

Again, my thanks for spending so much time like this. I hope to repay you by helping someone else someday. What goes around comes around... - Joel :)
Thanks for the follow up and the points :)

Glad to hear your problem was resolved and that my comments helped a little.

Re the hosts file. Definitely a lesser of evils choice and you make a good case for your decision especially in the case of your sales force which you have have to protect when they're on the road too.