NSLOOKUP normal, but certain machines not resolving anyway

I am in the process of migrating my users from an NT network to a Win2003 Network.  Those users who are still on the NT network point to three corporate DNS's outside our facility and system, and all of them have private IP's.  Recently, users on the NT network started calling me to report they were offline.  When I sit down at their machines, I can see they are online, but not resolving.  I can ping by number but not by name, and can browse by number but not name. If I do NSLOOKUP, I get responses from the DNS servers, but I still cannot ping by name or load web pages by name. Rebooting does not change anything. The DNS folks seem to think it is at my end, but I am having trouble finding what is unique to the particular users having the problem (about 8 out of 75), and why it started when it did, as nothing had changed. The rest of my users are not having this problem even though they all point to the same DNS's.

No events are pointing to any DNS problems on the computers.  How often it happens varies.  A couple are off nearly all the time, others do it every week or two. Each is unique and frequency varies.  They pop online at odd times, no discernible time or activity causes them to start resolving.

I installed ethereal on one machine and ran pings and NSLOOKUPs while it was resolving and again while it was not, here is the result:

Resolving:  I see queries go out and responses come back for pings (Standard query A and Standard response from the DNS with the IP), then the four ICMP ping request and responses, and I see NSLOOKUPs that look normal (Standard query A, Standard Response with the IP).  I also see a tons of Standard query PTR for internal private IPs, which appears to go on all the time, and responses of "no such name".

Not resolving:  When I ping, I see the queries go out (Standard query A), but nothing comes back. It tries each DNS in the list and gets no response, then no ICMPs.  NSLOOKUPs look as though everything is peachy.  I see Standard query A go out and Standard Responses come back with the correct IP.  I also still see the Standard query PTR for internal private IP's and Standard responses "No such Name".  

It turns out that locations all over the country are having the exact experience I am having, just a few users at each, and their descriptions are identical to my problem. NSLOOKUPS always work, even when there is clearly no resolving going on for pings or web browsing.  I added a registry setting to increase the DNS query timeout and an adapter timeout, but it has made no difference.  If I migrate one of these users to the Win2003 network, where they point to our local DNS that uses the corporate DNS's as forwarders, the clients no longer have a problem.  However, some of these machines have to stay on the NT network for now and I am getting nowhere.  All the clients are Windows XP, all patched and with current virus definitions.  The DNS folks seem to be hanging us out to dry - not their problem.  I am plumb out of ideas.  I have tried IPCONFIG to renew their DHCP numbers, set fixed IPs on them, flushed their DNS caches and reinstalled networking on them.  I have found nothing that sets these machines apart from the 60+ that are working normally.  
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I know this is going to sound strange.. I am guessing that you are not running WINS.  Or, make sure that your DHCP is set to NOT use Netbios.  By default, DNS in an AD environment will try to use Netbios (WINS) first because that is how it authenticates to the domain.  Give that a try and let us know.

Good Luck

Try appending the windows domain name to the end of the host name.

IE: ping host.domain.local instead of just plain old ping host
SusanSBAuthor Commented:
You are correct that I am not running WINS, but none of the affected machines are in the Active Directory either.  They are logging on to an NT network that is not part of the Active Directory, and the DHCP is on NT.  When I do log one of these onto the AD and point them to the local DNS (with the off-site DNS's as forwarders), the problem goes away.  The DHCP is set to not use Netbios.  Am I understanding you correctlY?
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

SusanSBAuthor Commented:
The pinging is to any external site - say, time.nist.gov.  If the machine is not resolving and I ping time.nist.gov, the ping fails after a long pause.  If I ping, I get replies.  I determined from the packet capture that the long pause before the fail seems to be because it is querying the DNS's in order and getting no DNS responses.  Am I answering the question you are asking?
flush DNS cache on your local servers and workstations having issues
You may also want to try stopping and restarting the "DNS Client" Service. Or stop and disable it. Its just a local DNS client cache, and I often find it to cause more problems than it helps with. Another idea- why not just use the internal DNS for these machines anyway? You can even create another zone that covers the namespace of the NTdomain, if you like. Then all queries against these DNS servers will return valid responses, and you will have worked around the problem... ? Just an idea...

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Are you restricting access to the 2K3 DNS to ONLY members of that domain?  Otherwise, why not just set the DHCP server to hand out the address for your internal DNS (with forwarders)?

Are there possibly any restrictions on the DNS servers that wont resolve for you?
SusanSBAuthor Commented:
Since I am committed to try to fix this at the client level using the external DNS's, I tried conradie's suggestion first to restart the DNS Client Service.  I had two machines drop this week, and the result was the same on both:  Restarting the DNS Client Service made them immediately start resolving.  I set both machines to disable the service, and now have to wait to see if they continue to resolve -- excrutiating when the problem happens so sporadically. With either of these clients, there could be two weeks that pass before I get the dreadeed call.  Or not.

A second location has done the same with two clients, and had the same experience, so we both are waiting with baited breath and, in the interim, also researching the implications of disabling the service.  What is the down side of disabling the DNS Client Service in WinXP?  MS says "Note The overall performance of the client computer decreases and the network traffic for DNS queries increases if the DNS resolver cache is deactivated. " (http://support.microsoft.com/kb/318803/en-us).  That certainly sounds ominous...
The only downside is that you will no longer cache lookups.  Every name that needs resolution will make a call to DNS.  So if you browse to EE, close your browser and then go to EE again, you will make a DNS call both times, rather than only the first.

I wouldn't go so far as to say "ominous", it'll add a little delay while you wait for a DNS resolution, and the added network traffic of more DNS calls.
Exactly aseusainc. Working in IT, I often want to make sure that I am using the latest changes in DNS and am not using cached info, and so I keep the service disabled on my workstation for this reason. Its pretty far from ominous.
And... because nslookup goes directly to the DNS server and ignores the client cache, thats why you see normal behaviour on the clients when using nslookup. With only 75 total users on your network, even if you disabled the service on ALL of them I doubt it would cause any noticable traffic problems on your network.  
Sorry for the string of posts.... One more "permanent fix" idea and I will shut up... : )
I have seen a TCP reset fix this permanently for at least one machine. Check out the link below for instructions.


SusanSBAuthor Commented:
Hey conradie - so far, so good on stopping the DNS client.  So far, so good.  I will save the "permanent fix" for the first machine to come up doing it again.  It has only been a week now, so I may be back later with the stubborn ones.  But so far, so good.  Another location tried it too, with user machines that were off nearly all the time, and so far his clients are on too.

Thanks to all of those who contributed!
Glad to hear it. Thanks for the points!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.