VMWare VSphere 4.0 and Virtual Center 4.1

Hello all,

I am having the following Issue:

Virtual Center Drops connection to one of our 2 VMWare hosts.  

Virtual Center is version 4.1 installed on Windows Server 2008 running as a VM at 192.168.0.6.

The first VMWare host is VSphere ESX 4.0 running at IP address 192.168.0.x.  There are no issues with this host.  It was added into Virtual Center
with no issues and has not disconnected.  It does contain the Virtual Center VM

The 2nd host is at IP address 192.168.100.190; it is in a DMZ.  It is also running VMWare VSphere 4.0 ESX and is running on IP 192.168.100.x.  The host is a
successfully added to VCenter, and is accessible for about 10-15 minutes at a time, then suddenly, without anything happening, disconnects.

Both hosts have datastore(s) located on an Equallogic iSCSI SAN and have no issues accessing them.  (that I have been able to observe)

What I know so far:

for the problem host:

1. I can access the service console via a SSH and do not get disconnected, even after a couple hours.
2. I can also access the host individually from the VSphere client, and stay in the client for a couple hours without getting disconnected.  Only when
I utilize the Vsphere client and connect to the VCenter server do I get disconnected.  I have utilized the VSphere client on 2 separate machines with the
same result.
3. I did specify the IP address of the management interface in the VSphere client for Virtual Center (top menu, administration, runtime settings)
4. I have restarted the management agents, network service, the vxpda (don't recall what that one is), and the webaccess service to no avail.
5. I have verified I can ping from the virtual center server to the VMWare problem host
6. I have verified I can telnet from the virtual center server to the vmware problem host on tcp port 902.  
7. Disabled vmotion and HA on the adaptors for the problem VMWare host.  
8. Verified licensing was installed on both hosts and properly assigned.
9. Update manager is installed in Virtual Center along with the base install.
10. I have completely removed the problem VMWare host from Virtual Center and re-added it back in.
11. I have verified the vpxa.cfg file contains the actual address of our virtual center server, and not the loopback address
12. There is a firewall between the internal network and the problem VMWare host.  Port redirection for tcp port 902 has been configured.  I haven't
checked the firewall myself, but the security administrator assures me there is no NAT going on.  

All of these steps have yielded the same results; the problem VMWare VSphere host drops offline.  Yet, I can access through SSH to the service console
as well as through the VSphere client directly to the host.  Only going through the VSphere client to the VCenter server will the vmware host disconnect.
The message received is "host is disconnected" and there is a red circle over the problem host in VCenter.  

So, at this point, I'm kinda lost.  I'd appreciate any suggestions, etc.

Thanks all
LVL 1
rslanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

George KhairallahCTOCommented:
It sounds like you've done quite a lot of troubleshooting, and are on the right track. It does sound peculiar, and don't have a lot of ideas beyond what you already tried.

One thing that comes to mind though, you mentioned that the second host is on the DMZ:
My first question is, why would you have it in the DMZ, if it's in the same cluster? , it does't seem like there would be any security benefit from that.

The second question is, if your second host is behind a firewall, and/or traversing additional switches, do you use STP on your network?

You mentioned that you did try a ping from vCenter to the host, and vice versa. Do you experience any ping timeouts from the host to the vCenter, when the vCenter reports the host disconnected?

 
Paul SolovyovskySenior IT AdvisorCommented:
If they're in the same cluster they will need to be on the same subnet so that they have the same keep alive (default gateway) that's pingable.

Also make sure that the default gateway of the dmz is pinabale as well (if not in the same cluster)

Another solution would be put a management port/service console onto the production network and just keep the vswitch on the dmz.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bgoeringCommented:
As paulsolov indicated - it might be wise to place both ESX hosts on the same inside network as the vCenter server, then just create a vSwitch in whatever other networks (dmz) the virtual machines need access to. Is there any particular reason you need to access your service console from the outside or dmz?

If that isn't possible in your shop there are quite a few other ports to consider in addition to tcp 902. See http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012382 knowledge base article for a complete rundown. Scroll down to the first column says ESX 4.x and you will likely need to enable everything through the firewall that has to do with ESX/ESXi host to ESX/ESXi host, ESX/ESXi host to vCenter, if using AD authentication you will need to pick up the needed ports for Active directory and LDAP also.

Might not hurt to look at what is required for vCenter Server 4.x to ESX/ESXi host as well, though likely you are already allowing everything from inside to your DMZ segment.

Good Luck
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

vmwarun - ArunCommented:
Are the ESX hosts running in 192.168.100.X and 192.168.0.X built with the same v4.0 build (U1, U2) ?
Danny McDanielClinical Systems AnalystCommented:
It sounds like an issue with port 902 through the firewall.

http://kb.vmware.com/kb/1011647 although there's an issue with certain cdrom drives, too http://kb.vmware.com/kb/1017297

The heartbeats go from the host to the VC on 902, so make sure it is open both ways and/or test telnet from the host.
Lee OsborneSenior Infrastructure EngineerCommented:
I would agree with some of the comments above regarding the VM hosts in the same network, and the VM's on a vSwitch to the DMZ. This is how I run my VM's using one vSwitch for the LAN, one vSwitch for the DMZ, and a third vSwitch for an isolated LAN for testing. This way, the hosts and vCenter remain on the same subnet and the VM's are allocated their switch accordingly.

Lee
bgoeringCommented:
One other thing to check on the firewall - make sure the firewall isn't timing out and closing your heartbeat port 902. I once had a similar issue on Oracle client connections through a firewall...
VMwareGuyCommented:
Putting your service console in the DMZ is a horrible idea any way you look at it, and why in the world would anyone need to access an ESX service console from the oustide other than through a VPN connection to your internal network?  Wire the physical NICs your service console is mapped to to your internal network, and keep some nics wired to the DMZ so you can have VMs that live on them.  Create the same config on your other host and then you will have a solid VMotion and HA cluster and VMs in the DMZ can failover between hosts.  

I had a similar situation not long ago and I resolved it by making sure all the ports that are required to be open on the firewall were also opened up oon the w2008 server's firewall where vCenter server is installed, or disable your windows firewall all together.  This could be where your issue is.  

As for gateways - you can't have HA without an address to ping, but it doesn't have to be a gateway address, you can set an advanced parameter called das.isolationaddress in HA advanced properties that will allow you to configure HA with a new gateway address, you simply define the pingable IP address in the value field, and you can refer to your vSPhere availability guide for these advanced parameters for HA.. you will want to set one in addition to das.isoloation address that makes it so HA doesn't use the default gateway at all, it is das.usedefaultisolationaddress and set it to false.  

 

 


 
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.