Link to home
Start Free TrialLog in
Avatar of Tomasz Czyz
Tomasz CzyzFlag for Norway

asked on

High disk latency on one ESXi 5.1 host when running backup

I have strange problem. Few days ago I replaced 1 ESXi host in our cluster with new one.
I'm using the same HBA card, the same cables connected to same ports in Brocade switch.
I am using Veeam 8 for backup. Everything works great with virtual machines on this host, but when backup starts, disk latency is jumping up to 700 DAVG/cmd, when KAVG and QAVG are under 20.
We are using EMC VNX. In the cluster I have 2 more hosts with exactly the same configuration and the same HBA model, but the problem is only on this newest one, so it's definitely not the storage problem.
Fiber cables are tested. Zoning was configured correctly when HBA card was in old host.
I am thinking  about getting brand new HBA and configure zoning again, but it costs a lot and I don't want to spend much money if the problem is in configuration.
Any advices what should I check?
Or how to find out if this latency is zoning problem?
Avatar of Mohammed Khawaja
Mohammed Khawaja
Flag of Canada image

Swap HBA with another server and see if the problem follows the server, HBA or cable (assuming you have other cables or switch cables with another host).
check you have no dust on the FC cables or in the FC ports on the HBA!

Is the ESX host identical in configuration, memory, processors, cpu, firmware, bios, bios configuration.

FC HBA in the same slot, wrong slot, slow slot, not support slot in the server.
Avatar of Tomasz Czyz

ASKER

@Mohammed I would like to do that, but can't do it without downtime, 1 of 6 hosts is already down and I  can't run all VMs on 4 servers.

@Andrew Thanks for advice. I've checked FC cables with cable tester, but didn't check for dust  on FC ports.
new ESX host is HP Proliant DL360p Gen8, it replaced DL360p G6. 2 other Gen8 servers have the same configuration, only difference is +/- 20GB memory and ESXi version, problem is on 5.1 build 3070626, but others have 1743533.

Strange that even when VMs are using disk quite actively (high read and write/s), the latency is max 20, but when backup starts, it jumps up to even few thousands. Even if backup is done (it takes 2 hours instead of 10-15 minutes), host is hanging. Any command in vSphere results with timeout.
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have one host working with the same build without problems, but now I see it has other type of FC HBA.
Thanks for advice, I'll try older build and let you know the results.
Last few days I tried following:
- replaced HBA with the same model
- brand new fiber cables
- reconfigured zoning and storage access
- installed lower ESXi version

No change at this point, got same problem.

- forced manual reinstallation of drivers and firmware for HBA
- upgraded flash fw
- updated all firmware from latest SPP
- installed all ESX updates via Update Manager

Now it's better, I used to reproduce problem by starting VM restore to this host, and this time it ran without problems... but last night I've got alerts again that latency was high for around 10 minutes. Good thing is that after those 10 minutes everything got back to normal.

I'm going to test it with more VMs.

HBA I'm using is HPAE312A which according to Qlogic is already End-Of-Life, so I am going to buy new 8Gb dual-port HBA. Any advice for HBA compatible with Proliant DL360p gen8?
We've always preferred Emulex HBA for FC, and Qlogic for iSCSi!

What ever you use, make sure there are on the HCL.
I've requested that this question be closed as follows:

Accepted answer: 0 points for cafejava's comment #a41225786

for the following reason:

Upgrading drives and firmware on HBA and all firmware on HP components solved the problem.
Upgrading drives and firmware on HBA and all firmware on HP components solved the problem.

Sorry I marked wrong comment as solution before