Link to home
Start Free TrialLog in
Avatar of m_white
m_white

asked on

VMWare running SLOW!

Hello,

     I am having a problem with my new VMWare environment running ridiculously slow.  I have just built the environment on an IBM BladeCenter.  I am running three ESX 4.1 hosts on IBM Blades with duel six-core Intel 5650 Xeons, with 48gigs of RAM Each.  They are all connected to a HP LeftHand iSCSI SAN.  This system is a replacement for an older HP Blade system.  After getting ESX installed and VMWare connected to the iSCSI network, I attempted to migrate over an existing “test” VM from our old environment.  Upon turning it on, I instantly noticed a major problem.  The VM took almost 15 minutes just to boot to the login screen.  Once logged in, it took another 15 minutes just to get to the desktop. The simplest tasks, take forever.  I assumed that this may be an issue related to the migration and old to new hardware, so I attempted to build a new VM (Windows server 2k8) from scratch, on the IBM system.  That was at 3pm west coast time, yesterday.  The installation process is at 68%.  So at this point I am thinking that there is something wrong.  I have gone through most of what I know how to check, but I am at a complete loss as to why the environment would be running this slow.  Any help would be greatly appreciated!

Thanks!
Avatar of wilmaflintstone
wilmaflintstone

It sounds there are some major timeouts.

The most obvious reasons for these are hardware issues, or misconfiguration issues.

Look on the ibm site to make sure you have the correct firmware for all parts of your vmware servers. Including the bladeserver itself.
Then see if you can find a netwerk tracer to see what packets go between your vmware server and the iScsci disks. I suspect it might be there somewhere, but that is just a hunch.

Because of the complexity of your system you should troubleshoot step by step.

I suggest beginning with a linux-boot of the blades and trying to access the iscsi disks and see what kind of performance you are getting there.
If everything goes well, you might have an issue with vmware, that you should get into to.
If you have problems on the performance there as well, you can go from there.

But first: check the firmware. We had issues with performance on aix blades that were resolved by new firmware.
Avatar of m_white

ASKER

I tried updating the firmware, to no avail...
The iSCSI disks are working fine on the HP system, so possibly a configuration in the VM side, that connects to the iSCSI disks.  I have checked all of the settings and they look good, but my expertise with iSCSI is limited.
Take a look at the performance tab in the vSphere client for your hosts. Check the storage adapter and storage path screens.

What is the read and write latency you are seeing for your SAN?
a mismatch in your drivers might also cause this.

Please report the stats that bgoering suggested you look at.
Avatar of m_white

ASKER

The latency average is 1.733ms and hasn't spiked past 20ms.  Not sure on the drivers issue, where would I check that?  The problem exists prior to the VM even hitting Windows.

Thanks!
Those numbers seem reasonable. When you migraged your vm did you upgrade the vmware tools, then upgrade the virtual hardware?

The tools upgrade will give the driver support for your guest on the new ESX version. Then the upgrade virtual hardware will give optimized virtual devices to your vm. Please note however, that once you upgrade the virtual hardware you will be unable to migrate your vm back to the old cluster if it is running an ESX level that doesn't support hardware version 7.
Avatar of m_white

ASKER

The hardware and VMWare tools have been upgraded.  This problem happens, even when attempting to install an OS.  Linux, Windows, otherwise..
also double check your bios settings on your new blades... for your x5650 processors make sure all virtualization extensions are enabled. Also enable hyperthreading - that wasn't recommended on older processor types but works great with x55xx and x56xx processors.
Avatar of m_white

ASKER

Already checked the BIOS.  Everything looks great there as well.  

Also, to add to the iSCSI theory, I setup a new VM last night, on the local storage of one of my ESX's, and it runs great.  No slow down at all.  Runs perfect.
That last test really adds to the iSCSI theory indeed. Give me a few more details on the iSCSI setup. Is it the VMware software iSCSI? Or do you have some dependent or independent HBA support in your blade chassis? Are there multiple paths? And last but not least make certain you are getting the gigabit connection on your NIC that goes to iSCSI instead of 100MB.
Avatar of m_white

ASKER

Yes, it is VMWare Software iSCSI.  Single Gigabit path at this time, running through a Cisco 3012 chassis switch, second switch is on the way to create the multipathing.  I have verified that we are running Gigabit connections, and watching the traffic across the switch, it isn't spiking, or overload the port.
We had a problem with the path that packets travelled on our fiber network.
I am unfamiliair with wath kind of possibilities iScsi has, but it might be a traveling problem.

What i mean with that is that the packet does not take the shortest path but a long way. And that add's to latency.

Check if you can ping the iscsi devices from the blade (from within the vmware console). And what ping times you have.
If it is more than a couple of millisec's we have to do a tracepath/traceroute to see where the holdup is.
If you do not have a holdup with the pinging, it might be something on a driver level.
@wilmaflintstone - vmware software iSCSI requires the initiator to be on the same IP subnet as the storage, so unless extended VLANs or bridging is involved there isn't any short or long path considerations. What is confusing to me is that the 1.733 ms average latency that was posted is reasonable for iSCSI storage over gigabit - yet the performance is poor.
Avatar of m_white

ASKER

The iSCSI SAN is plugged into an HP Gigabit switch (not sure on the model at this time), which is plugged into a Cisco 3750.  The blades are plugged into a Cisco 3012 (Inside the blade chassis) which is plugged directly into the same Cisco 3750.  The ping response time is an average of 0.186ms.  
I know that in past setups of VMware we always configured VLANs on the switches for the iSCSI traffic. I wonder if there is something there that's not passing traffic, or limiting it somehow?
On the LeftHand side of the fence, can you create a test volume for iSCSI and then mount it from an existing server or workstation outside of the VMware environment to test performance? If the performance is still bad at that point, I would say there might be a problem with the infrastructure setup or configuration at that point. If it's fast, then I would look back to the blades or switches and double-check the configs.
Avatar of m_white

ASKER

I have another blade environment also attached to the SAN accessing different LUNs.  So that verify's that the configuration of the SAN itself is good.  As far as the configuration of the switches... My Network Admin says they are good.

Thanks,
ASKER CERTIFIED SOLUTION
Avatar of readydave
readydave
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Please follow up when you can and let us know what VMware support found so we can all benefit from the information. Thank you for the points!