VMWare ESXi 4 Server; Win2k3 DHCP Failures

Hello Experts,
This problem is killing me. :-)
Recently, we deployed a new HP DL360 G6, installed ESXi v4, and setup 4 guest OS's with Windows 2003 Server R2, SP2. One of the guest OS's has DHCP services running.

First found in the mornings, DHCP will no longer work for LAN's outside of the subnet of the server (3 outside offices use DHCP from this server). DHCP works fine on the LAN it's physically connected to.

The temporary fix is usually to restart the guest OS with DHCP, but sometimes this doesnt work either. Further tinkering will usually get the services working properly again. (i.e stop/start DHCP)

We have Cisco routers with DHCP relays/helpers enabled, and I'm told these are all configured properly. Since the DHCP starts to work after tinkering with the server (ex: restart or DHCP services restart ...), I am thinking the Cisco routers are likely configured properly.

So I would like to ask for some troubleshooting steps which could help me find the cause of the issue.

I have looked at the mon-fri DHCP logs, and cannot seem to verify if they look normal. There are a ton of the following messages:
31,12/21/09,11:55:08,DNS Update Failed,192.168.64.143,USPLY1107.domain.int,-1

If you have some ideas - please help! Any ESX changes I can try? I can provide much more information if needed. Thanks!
LVL 7
ottobockAsked:
Who is Participating?
 
Paul SolovyovskySenior IT AdvisorCommented:
Is your DHCP Server also a DC?  Were these new servers or did you P2V them?

I would make sure there are no other dhcp servers such as linksys cabel routers, etc..  I would then check the port settings on the switch make sure the ip helper command is working.

I have seen this as well when there is more that one route to the source so make sure that STP or PVSTP is running.  
0
 
ryder0707Commented:
Try enabling portfast for ports connected to the esxi host.
0
 
ottobockAuthor Commented:
Hello - thanks for the ideas!
I have forwarded the inquiries about PortFast, as well as the potential for different routes to the source, to our network team. We're not running STP on switches, but there are some additional routers to our other buildings for redundancy. Could be something there.

It's just strange that it will work for some time, and then I will get called nice and early in the morning after DHCP crashes again a day or so later... I'm trying to still verify if the "DNS Update Failed" message could have anything to do with it, and/or it is associated with ESX network setttings.

We made a few changes yesterday - and so far this morning it's working OK, though the DNS Updates are still failing. Hmm.

Thanks again!
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
ottobockAuthor Commented:
Forgot to answer paulsolov's first couple questions: the server is not a DC, but an application server. The server was also not P2V'd, but was converted to the newest vm version 7 during the change from 3.5 to 4. When it was 3.5, there was no DHCP on this system, this was just added recently.
Thanks!
0
 
Paul SolovyovskySenior IT AdvisorCommented:
Make sure that the vmware tools are up to date.  Also make sure that you're running SP2 on the DHCP Server, this was an issue with SP1 systems.
0
 
ottobockAuthor Commented:
So after 2 days, the DHCP has remained available - so this is good thus far.

The DNS updating problem remains, but strangly, it seems to be mainly erroring on the outside subnets this server provides DHCP for. The local subnet appears to be updating DNS records OK. I can tell not only by the logs, but also all the DHCP leases have the waiting-to-write icon...

The DNS server is on the same subnet as the DHCP server, and I've verified there is no authentication needed for DNS updating, so now I'm getting more confused.

Could it just be a matter of time for this to sort itself out?
WaitingToWrite.jpg
0
 
ryder0707Commented:
what is the event id in eventvwr for the dns problem?
0
 
ottobockAuthor Commented:
Hello again - same DHCP issues remain. Last week, DHCP failed again on Tuesday evening, and then again on Wednesday morning. Due to the holidays, I've not been out, but surprisingly, the DHCP was working fine today and last night when I checked...

Our networking crew has some ideas involving pkt caturing and seeing if the server/guest is even receiving the DHCP req packets.

To answer your question ryder0707, I meant the DHCP logs located at: c:\windows\system32\dhcp. There is never any additional information in the actual eventvwr except when DHCP is stopped or started. :-(

Thanks again everyone! We're really close to just trying a seperate physical server to ensure the ESX or guest config is not to blame somewhere. This is tricky too however, as all the Cisco routers have to be readjusted again...
0
 
ottobockAuthor Commented:
Hello,
DHCP has been working without fail for over a month - with a few server restarts in between. The DNS updating seems to also be working, through the "waiting-to-write" icon is still everywhere. So I'm not sure why this is - but if it's working, I can ignore this for the moment. Thanks to everyone for your ideas.
0
 
ottobockAuthor Commented:
Cause of issue not fully found, however, DHCP has been working for some time without failure.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.