Link to home
Start Free TrialLog in
Avatar of ottobock
ottobockFlag for United States of America

asked on

VMWare ESXi 4 Server; Win2k3 DHCP Failures

Hello Experts,
This problem is killing me. :-)
Recently, we deployed a new HP DL360 G6, installed ESXi v4, and setup 4 guest OS's with Windows 2003 Server R2, SP2. One of the guest OS's has DHCP services running.

First found in the mornings, DHCP will no longer work for LAN's outside of the subnet of the server (3 outside offices use DHCP from this server). DHCP works fine on the LAN it's physically connected to.

The temporary fix is usually to restart the guest OS with DHCP, but sometimes this doesnt work either. Further tinkering will usually get the services working properly again. (i.e stop/start DHCP)

We have Cisco routers with DHCP relays/helpers enabled, and I'm told these are all configured properly. Since the DHCP starts to work after tinkering with the server (ex: restart or DHCP services restart ...), I am thinking the Cisco routers are likely configured properly.

So I would like to ask for some troubleshooting steps which could help me find the cause of the issue.

I have looked at the mon-fri DHCP logs, and cannot seem to verify if they look normal. There are a ton of the following messages:
31,12/21/09,11:55:08,DNS Update Failed,192.168.64.143,USPLY1107.domain.int,-1

If you have some ideas - please help! Any ESX changes I can try? I can provide much more information if needed. Thanks!
Avatar of ryder0707
ryder0707
Flag of Malaysia image

Try enabling portfast for ports connected to the esxi host.
ASKER CERTIFIED SOLUTION
Avatar of Paul Solovyovsky
Paul Solovyovsky
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ottobock

ASKER

Hello - thanks for the ideas!
I have forwarded the inquiries about PortFast, as well as the potential for different routes to the source, to our network team. We're not running STP on switches, but there are some additional routers to our other buildings for redundancy. Could be something there.

It's just strange that it will work for some time, and then I will get called nice and early in the morning after DHCP crashes again a day or so later... I'm trying to still verify if the "DNS Update Failed" message could have anything to do with it, and/or it is associated with ESX network setttings.

We made a few changes yesterday - and so far this morning it's working OK, though the DNS Updates are still failing. Hmm.

Thanks again!
Forgot to answer paulsolov's first couple questions: the server is not a DC, but an application server. The server was also not P2V'd, but was converted to the newest vm version 7 during the change from 3.5 to 4. When it was 3.5, there was no DHCP on this system, this was just added recently.
Thanks!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So after 2 days, the DHCP has remained available - so this is good thus far.

The DNS updating problem remains, but strangly, it seems to be mainly erroring on the outside subnets this server provides DHCP for. The local subnet appears to be updating DNS records OK. I can tell not only by the logs, but also all the DHCP leases have the waiting-to-write icon...

The DNS server is on the same subnet as the DHCP server, and I've verified there is no authentication needed for DNS updating, so now I'm getting more confused.

Could it just be a matter of time for this to sort itself out?
WaitingToWrite.jpg
what is the event id in eventvwr for the dns problem?
Hello again - same DHCP issues remain. Last week, DHCP failed again on Tuesday evening, and then again on Wednesday morning. Due to the holidays, I've not been out, but surprisingly, the DHCP was working fine today and last night when I checked...

Our networking crew has some ideas involving pkt caturing and seeing if the server/guest is even receiving the DHCP req packets.

To answer your question ryder0707, I meant the DHCP logs located at: c:\windows\system32\dhcp. There is never any additional information in the actual eventvwr except when DHCP is stopped or started. :-(

Thanks again everyone! We're really close to just trying a seperate physical server to ensure the ESX or guest config is not to blame somewhere. This is tricky too however, as all the Cisco routers have to be readjusted again...
Hello,
DHCP has been working without fail for over a month - with a few server restarts in between. The DNS updating seems to also be working, through the "waiting-to-write" icon is still everywhere. So I'm not sure why this is - but if it's working, I can ignore this for the moment. Thanks to everyone for your ideas.
Cause of issue not fully found, however, DHCP has been working for some time without failure.