[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2872
  • Last Modified:

New DHCP server, PC's keep losing connectivity

My company has 3 VLANs, Regular, Production, and Wireless.  Previously, our DHCP has been handled by our Sonicwall Pro 4100 firewall.  We want to move DHCP over to a regular server in anticipation of changing our domain.

I have a server running as a domain controller for the new domain, but no nodes have been assigned to it yet.  It has, however, been set up to serve as a DHCP server.

After some trial-and-error, the new DHCP server seems to be working fine except for a problem on the Production subnet.  Note that the DHCP server is giving out 8-hour leases.

The Production subnet is set up so that if a node is plugged in on a Production port, the DHCP server only assigns an IP based on a MAC Reservation.  So it's basically MAC filtering.  Also, devices on the Production subnet are limited at the firewall to two shares on the Normal subnet and one site on the Internet.

Currently, if I switch over to the new DHCP, I can go to a particular machine and do an ipconfig /release and /renew to have it go ahead and pull its IP from the new DHCP server.  Everything goes fine for a while, but eventually, it loses connectivity.  And then it comes back.  And then a couple hours later, it goes away again.  So then I turn off the new DHCP and turn the old back on, and it still happens a little later.  Eventually they have to keep shutting down and restarting to regain connectivity for a while (as I left it tonight).

In a previous experience, when the first device began having problems, I did a release/renew on another machine and it started having the same problems, so I don't think it's the particular machine.

  • 12
  • 4
1 Solution
GlennSimpson2Author Commented:
I did not mention - at no time do I lose the ability to ping the computer that's having problems.  It stops being able to ping the Normal subnet (including the DHCP server and DNS servers), but can ping its own gateway.  I don't think I've remembered to see if it could ping other devices on the Production subnet but it probably can if it can hit the default gateway.

It was suggested that this might be an issue of duplicate IPs, but I ran Angry IP Scanner and it never showed any duplication.
Leon FesterCommented:
DHCP only allocates the IP address.
It has nothing to do with a device losing connectivity to a network.

Is this only happening on one device or on all workstations?
Are all the workstations, and specifically their NIC's the same make/model?

I'd start by updating the drivers of the NIC.
How old is your switch?
Have checked it for any errors?
GlennSimpson2Author Commented:
I'm certainly baffled about why DHCP would cause this problem, but that's the only significant thing that is changed right before this problem crops up, and has caused the problem repeately.

On Wednesday, I did a release/renew on two different workstations and experienced the problem on both.  On Thursday I only did the release/renew on one machine.  The two machines are different makes/models.  Both are XP Pro, BTW.  I don't want to do much more than that because if the printers go down in our production environment, that's bad.

The switch is a Sonicwall Pro 4100. Not sure how old it is, but as a reference, it cannot handle TCP autotuning.

We might not have adequate auditing going on, but using the Sonicwall ViewPoint program and viewing logs for the one device that was having problems yesterday, it seem to be spending a lot of time trying to visit sites like, and with destination port 0.  Those appear to be Microsoft/Windows sites, possibly update.  The other machine that was still using the old DHCP did not try to do those things during that same period.

The machine appears to be showing up on the WSUS list just fine, so I don't know why it would be going out to WU.  

Other, possibly relevant info:

The new DHCP is handing out both of the DC's for the old domain and the DC for the new domain as DNS servers.

The new DHCP is only handing out the old domain name.
 The Evil-ution of Network Security Threats

What are the hacks that forever changed the security industry? To answer that question, we created an exciting new eBook that takes you on a trip through hacking history. It explores the top hacks from the 80s to 2010s, why they mattered, and how the security industry responded.

Leon FesterCommented:
Some questions:
Have you always used such a short lease time?
How long is the scavenging intervals?
Are these settings identical on both servers?
Have you verified that the scopes on these servers don't overlap, they shouldn't

If there is an old DHCP server remove any old leases for your workstations.

It's a bit of a long shot in the dark, but I've recently seen some peculiar behavior on a domain where we recently built new DHCP servers with new scopes after some VLAN re-design.

Also just verify that the DNS records still points to the correct IP.
If the IP address is changing and there is a cached address for the workstation, it could be the reason why IPCONFIG /release and IPCONFIG /renew is causing some issues.
GlennSimpson2Author Commented:
The old DHCP uses 24 hours.  I only set the new one for 8 at someone's recommendation so that if something's going to break, it break's sooner than later so I'll know.

I'm not familiar with scavenging intervals, but based on my brief googling, I believe the answer is that the DatabaseCleanupInterval is 60 (1 hour) on the new DHCP server, which is being given out as a DNS server (it will be a DC on the new domain), which is running 2008.

SIDEBAR: I was told it was OK to include the new DC (which is also the new DHCP), as one of the DNS Servers through DHCP along with the two old DC's.  Let me know if you think that might be an issue.

Just a reminder, the firewall is the old DHCP provider.  The entirety of the settings on it are:
Enable DHCP Server  (yes)
Enable Conflict Detection (yes)
Enable DHCP Server Network Pre-Discovery    (yes)
DHCP Server Conflict Detect Period:    300 Seconds  
Number of DHCP resources to discover:    10
Timeout for conflicted resource to be rechecked:  1800 Seconds  
Timeout for available resource to be rechecked: 600 Seconds  
Enable DHCP Server Persistence    (yes)
DHCP Server Persistence Monitoring Interval: 5   minutes  

When I'm testing the new DHCP, I uncheck "Enable DHCP Server" on the firewall and then activate the scopes on the new server.  (I also turn on IP Helper on the firewall).  So they are never actively giving out IPs at the same time, although obviously some devices are still playing out their lease from the old DHCP.

The settings are identical.  In the Normal subnet/VLAN, I just have some exclusions to cover some static IP's assigned.  In the Production subnet/VLAN, I have reservations set up and exclusions set up for anything that doesn't have a reservation (so if you hook up to a port on that VLAN, unless I know your MAC, you don't get an IP address).  

So when you say the two servers shouldn't overlap, I think you're thinking I'm running them both at the same time.  I'm actually turning off one and starting up the other.  So they completely overlap one another, because the new is replacing the old.

So you're saying on my next test, in addition to the above, I should manually delete the "Current Leases" on the firewall.  Gotcha.

The only things the workstation is supposed to be trying to go to are the two shares (which have static IP's), the one Web site, and then the DNS servers, DHCP server, and default gateway, all of which have static IP's.  So there's no reason any pointer should be incorrect should there?  Just to be clear, where would I veify that the DNS records still point to the correct IP?

Another bit of information, just to be clear: when running on the old system, these hosts show (.16 being the production subnet) as both the DHCP server and the default gateway.  .1 does not exist other than as an interface on the Sonicwall.

Thanks so much.  I'm going to test this again on Monday, or possibly over the weekend. If there's anything else I should try along with deleting the old leases, let me know.
GlennSimpson2Author Commented:
Addition: For all DNS servers, "Scavenge stale resource records" is NOT checked, the No-refresh interval is set at 7 days, and the Refresh interval is set at 7 days.
GlennSimpson2Author Commented:
Update: Tested again today.  The main machine that I performed a release/renew on (.74) lost connectivity within 10 minutes this time.  It could ping some devices within its own subnet, but not outside of it.  Within another 10 minutes, another machine on the same subnet went down.  In another 20 minutes, both had come back up on their own.  At that point, another machine in a much more critical area went down, so I had to end the test.

I had planned to manually delete the firewall's leases when switching over.  However, upon doing so I learned that the firewall automatically deletes all of the current leases when I turn off DHCP.

I also noticed that immediately after the release/renew on #.74, the ipconfig /all data did not change.   But after a few minutes (but before it went down) it did change, at which point it displayed the new DHCP server in the list, for example.  However, the 2nd machine that went down (#.19) never displayed the new information in ipconfig, despite the fact that it also went down.

Also, while both were down, .74 could ping .19, but .19 could not ping .74. (both on same subnet)

Leon FesterCommented:
Have you seen any duplicate IP warnings?

You keep mentioning that you switch off DHCP servers.
I always advise people to make sure that the DHCP clients are off when you do the change.

Here's the reasons:
When swtiching off the old DHCP server, the leases are deleted from the database.
Similarly, when switching on the new DHCP server, there are no leases registered in the Database.

However, the DHCP clients at already online will keep the IP already issued by the old server.
At 50% of time to expiration of the lease, the DHCP client will send a request to keep the existing DHCP address. Meanwhile, a machine who's lease has already expired or was only switched on after the DHCP change, will be issued a lease from the new DHCP server.
Since the new DHCP server doesn't know about the other client already having leased that IP it leads to duplicate IP requests.

w.r.t. the scope overlap; I am suggesting that you keep the ranges separate.
e.g. scope on OLD server - and scope on NEW server - or similar.
Obviously this is dependent on the number of nodes on your network.

This will make it easier to spot where machine are getting their leases, alternatively double/TRIPLE-check the DHCP Server listed when running IPCONFIG /ALL.

DHCP Servers have the DNS Servers listed in the scope options, so this setting is assigned dynamically, but DNS can also be manually set in the TCP/IP properties of your network card.
GlennSimpson2Author Commented:
I have not gotten any duplicate IP warnings.  

Interesting idea of shutting down all dhcp-using machines when making the transfer.  I'll have to find out if there's a time when I can actually shut down all of the machines that are using dhcp.  The reason I didn't think it would be a problem is that most of the machines that seem to be involved in the immediate issue are either static or using reservations, so if something is trying to talk to the machine at IP, that machine is going to have that IP regardless of which DHCP server is handing it out, due to it being a reservation on both systems.  Nothing on the Production subnet actually gets a random address, and when things go bad, it can't get out of the Production subnet to the Normal one (where some things do).  Actually it doesn't even need to get to anything with a random IP - all of the servers and the NAS it needs to contact have static IPs.

Relating to that, I don't think I could do the scope non-overlap.  Many of the machines that have reservations have production batch files that are contacting that IP address specifically.  So those machines have to get the same IP address in both systems or the the overall production really will break down.

The troubled machines have listed as DHCP server (which is also their default gateway) which is a static setting set at the Sonicwall firewall.  When they change over following my turning off the old, on the new, and a /release/renew, they are showing as the DHCP server, which is the new DHCP server.

Noting that the new DHCP server is also going to be a DC/DNS server for the new domain once implemented, I have the following question regarding manual DNS: right now the new DHCP is handing out 3 DNS addresses to each machine - the two pre-existing DNS servers and the new DNS server.  But if I were to manually set the DNS servers, I'd only be able to put in two (unless there's a command-line method of putting in more than two).  Could the fact that I'm giving out 3 DNS servers have anything to do with it?  

Also, I have been offered the suggestion that my Cisco switches are not handling the new broadcast DHCP properly and some sort of IP Helper will need to be implemented there as well.
GlennSimpson2Author Commented:
Addendum: On this latest test, the machines went bad faster, after only about 10 minutes.  More machines began going bad within a shorter amount of time, and even after switching back to the old DHCP, the machines that were having problems continued to lose connectivity off and on for hours afterward.  I had to leave one machine logged on as administrator and write a batch file to do a release/renew because that was the only thing that would regain connectivity.
GlennSimpson2Author Commented:
Apparently my testing is causing some larger disruptions than I anticipated. I've been asked to hold off on further testing till after June 6.  I have a vendor coming in to assist, but I will continue to seek and share information on this problem after the 6th.
Leon FesterCommented:
You don't need 3 DNS server, you don't even need 2 DNS server.
2 is only required for redundancy.

If you can do it after hours, then please try getting those DHCP server switched while the workstations are offline.
The alternative would be to run IPCONFIG /registerDNS on each workstation after the DHCP switch over is made.
GlennSimpson2Author Commented:
(just a little bump to keep the thread fresh - waiting till after June 6).
GlennSimpson2Author Commented:
We tested on June 6, and nothing was readily apparent. My Cisco vendor is looking over my configs to see what he can see.  Tested again with a Sonicwall representative watching my firewall, and he says that when the connectivity is lost, he cannot see the PC any more.  So it's not that the firewall is refusing traffic from the affected PC, it's not getting any.  So that puts it back on the switching.

One small bit of weirdness - when I turn my firewall DHCP back on and the server DHCP back off, there's usually a period of an hour or two where I still have intermittend losses of connectivity on that VLAN.  I've discovered that if I regularly ping the nodes on that VLAN from outside the VLAN, they stay connected.  But this doesn't work when we're actively on the new DHCP.

 Will update when further testing is executed.
GlennSimpson2Author Commented:
We seem to have a solution.  We have a Riverbed Steelhead device in the mix.  When we took it offline, and switched over the DHCP, everything worked fine.  Rebooted it and put it back in, and so far (2 hours later) we haven't had any outages (other than a DNS issue but that's something else).

GlennSimpson2Author Commented:
I was able to work with an local contractor to help me solve the problem, but I wanted to share the results for future users.

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

  • 12
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now