DNS Issues

Hi.. I have an issue that is puzzling.. I manage a network of 250+ computers that 80% of the computers are wireless.  Recently, I have been having issues with computers not being able to connect to the internet.. stating that DNS is the issue.  I never seem to have a problem connecting to the wireless network and can connect to different computers internally, but the internet is the problem.  The strange thing is, that when I "reboot" the servers and restart the switches (with limited clients logging in) everything seems to operate without a problem.  When more computers are powered on and login, the DNS issues start showing up again, and before you know it.. everything is down.  I have tried to nail this down to a particular switch, or group of computers etc.. but there is really no rhyme or reason.. it doesn't matter which group of computers I choose, or which switch or access point.. that I try..  after so many computers join in.. we have DNS issues.  I have used nslookup and it seems that when I am having this issue.. that I get "timeout" messages and then it also doesn't want to show the name of the DNS server (which is our Domain controller).. when I restart the DNS server service a few times, it seems to kick back in and work for awhile until so many stations are back on, then we're out again.  I guess I'm looking for any suggestions that could help me identify the problem.. I do have a second DC on the network which has DNS installed on it.. but I'm not sure if it is correctly setup to be a "backup" DNS server if the main DNS server is stopped..  This could possibly help as well.  Thanks for listening.. any help is appreciated.  - Bill
William LarkinAsked:
Who is Participating?
 
giltjrConnect With a Mentor Commented:
Question.  You state you lose both DNS and Internet.  Do you really lose Internet?  Meaning, if you know the IP address of a web site on the Internet, say like one of Googles's is 74.125.228.244, and you enter just the IP address can you get to Google?

If you can NOT get to Google just by IP address, then you can ignore the DNS issue for now.  The DNS issue is because you loose Internet.  

If you can get to Google, then the problem is DNS and may not have anything to do with Internet connectivity.

Could  it be possible that one of the LAB computers has a duplicate IP address?

I would suggest that you turn on the LAB computers one by one and check for DNS/Internet connectivity after each one is turned on.   My guess is that it is one specific one that is causing the problem.

You may need to run a packet capture, I suggest Wireshark for this, to look for "things."

Things would include, did the MAC address for the DNS server or your default router/gateway change after one of the LAB computers was turned on.
0
 
KimputerCommented:
Usually without much configuration, the second DC should already be up and running. Just use nslookup on clients, pointing to this backup DNS, and see if it resolves correctly (both internal and external addresses). If so, add this as a second DNS server in your DHCP settings. Maybe now with less load on the main DNS server, the server fails less often.
0
 
giltjrConnect With a Mentor Commented:
Are the host names you can't resolve internal or external to your network.

If external, it sounds like you your DNS servers are having problems connecting to the Internet to resolve names.

In your DNS servers do you code forwarders or do you rely on the root hints?

Either way I would double  check your Internet connection.
0
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

 
William LarkinAuthor Commented:
Our DNS is working (internal and external) and I can use nslookup to check without fail.. BUT.. when I start turning the our lab computers (it's at a school), the DNS stops working for everyone? I've thought of viruses, etc.. but the crazy thing is that it doesn't matter which lab (we have 3) is started, the whole network loses Internet connectivity. When I turn those lab computers off, the Internet and DNS works again. There are approximately 40 computers other than the lab computers than will continue to work properly without issues.

I know this is a lot of reading, but I'm stumped.. I'd pay to have someone help me resolve this issue.. It's kinda hard to describe everything here and what I've have already done to troubleshoot. Thanks.
0
 
AkinsdConnect With a Mentor Network AdministratorCommented:
I think you are creating a DoS issue all by yourself. Denial of Service is what hackers use to shut or impair service. This works by sending too many DNS queries to the DNS server which then gets buggled up and freezes.

You seem to have identified part of the problem being your lab computers.

By the way, backup DNS does nothing until the primary DNS fails or shuts down (ie does not respond to queries). To use both at the same time, you need to make 1 primary for some PCs and the other primary for other PCs.

In this case, I would make the backup DNS the primary for the lap PCs.

Also, the performance monitor on your DNS, you may need to increase the memory in addition to other optimizations (cleanup, defragmentation etc)
0
 
skullnobrainsConnect With a Mentor Commented:
if the above dos theory is correct, you may want to check your firewall. it is very frequent that firewalls are configured to handle dns with so-called persistent udp sessions. these sessions do not know about the protocol and last until their timeout (10-60 seconds is common), meaning each dns query blocks a port for that much time.
0
 
William LarkinAuthor Commented:
Hi.. Thanks for ALL Comments.. everyone was a help in this matter and I used your suggestions to look into things and to troubleshoot the issues.  It looks like the main cause of the problem was in fact persistent UDP sessions (broadcast storm) which I had no clue to until I contacted our firewall vendor and via a remote session the problem was discovered. It seems that many.. UDP broadcasts are being transmitted and the firewall is dropping the "flood" of packets.. when the threshold was raised, the internet started flowing again. Also, as was mentioned in comments above.. I never lost connectivity.. I just couldn't use a browser to get to a website as DNS wasn't resolving anything. I could always 'ping' an external site.. such as our ISP's DNS servers.  Lastly, there is a program that we use here at the school that seems to be the source for these broadcasts.. so.. on to my next challenge.. stopping the storm.   Thanks Again! - Bill
0
 
giltjrCommented:
One simple way (at least in theory its simple) is to create a small routing only subnet between your internal network and your firewall using another L3 device.

That way your firewall only sees traffic that is either going to or coming from the Internet.  It will never see traffic that stays on your internal network, which means it will never see the broadcasts.
0
 
skullnobrainsCommented:
Lastly, there is a program that we use here at the school that seems to be the source for these broadcasts

if you give details, we may be able to help further



for example, when it comes to dns sessions (not your case but might give you a few hints), common ways to make things better include
- use upstream resolvers (probably not applicable to your app)
- use a fixed source port (at least you have one session per remote host:port tops)
- de not use statefull firewalling for this specific traffic (make it old school one rule for the outgoing packets and another for the answer)
- change the session duration (unfortunately this is usually applicable for all udp traffic and not rule by rule)
dumber dirty hacky workarounds include
- send a port unreachable from the dns server when you receive the answer (easy in dns because answers are always the last packet of the session)... AND accept the packet nevertheless : dns works and the port-unr instructs the firewall to kill the udp session. most firewalls will recycle the port at once or after a much shorter grace time.

note that if the sessions are actually internal traffic, @giltjr gave you a much better way to work things out.

if the lan segments are mostly open to each-other setting the interlan traffic to be accepted by default will (if supported by your firewall) most likely not create sessions for any traffic that you don't allow explicitely... and you can always create a final rule that blocks everything except udp on that specific port back and forth. if you're not a security freak, this is also workable lan-wan in some cases
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.