NLB and VMs
Posted on 2010-01-05
We're currently using blades for our TS enviroment and have been virtualising our enviroment slowly over the last couple of months.
We've just had a new server specifically to start testing some TS virtual machines and have a rather impending need to deploy some sooner than we'd like (as always happens!!)
To cut a long story short, we're using Windows Server 2003 with built in NLB then a round robin DNS that sits on top to point to 3 seperate clusters...
e.g. - DNS name - Company1 round robins to Cluster1, Cluster2 and Cluster3.
Each physically server has four NICs, - Management, NIC1, NIC2 and NLB adaptor.
I initially used the VMware converter to convert an active terminal server. I then removed all of the adaptors, ran NewSID (although I hear this is no longer required from an MS perspective) ... removed from domain, renamed and re-introduced to the domain, re-added all of the required adaptors and their IP configuration. The machine can be remotely connected to fine by multiple users.
Then the problem - I introduced this server to one of the clusters and that cluster went offline (could not ping the cluster IP). One of the servers within the cluster (bearing in mind the server i'd cloned was in a different cluster) started having authentication errors and other servers were very slow to connect to. I could not connect to NLB manager via the cluster IP and any users were booted from their sessions on all of those cluster's servers.
Skipping onwards... (I resolved the above problem by shutting down the VM Terminal server and rebooting all cluster servers) I thought this may be due to some residual config in the registry tying it to the server i'd cloned, so I started to build a new Windows 2003 server from scratch and re-installing all of the applications (absolute pig!) I spent time configuring everything as i'd done previously for the physical servers and went about the same process as above. I then introduced this server to a different cluster. Same problem! Random authentication errors, Cluster IP cannot be connected to via RDP or NLB Manager (or cannot be pinged)... users dropped from sessions, etc, etc.
I checked all of the adaptor settings on the new server and they look fine (NLB had bound correctly - for info, i'm joining the cluster via NLB manager).
Now skipping through event viewer on the new server I seem for the majority to get successful messages, however one of the servers that I had authentication errors on appears as;
NLB Cluster 172.30.120.100 : Initiating convergence on host 3. Reason: Host 10 is leaving the cluster.
NLB Cluster Initiating convergence on host 3. Reason: Host 10 is converging for an unknown reason.
I guess i'm not really giving enough information, but firstly is there anything I should know when adding a VM TS host to a Cluster... and does anything strike out at anyone above?
P.S - all of the VM guest adaptors can be seen on our network, routed to and can be connected to via hostname (DNS as far as I can see has updated fine....)