Write latency spikes on esxi backed by illumos zfs NFS

I am running ESXi 5.5u2 with the VMs stored on an illumos NFS server (omnios) with the zfs file system.  Average write latencies are low for the most part (< 1 ms average), but several times per hour, there will be spikes into the 50-100 ms range.  >75% of the time, these will be from domain controllers, but other VMs will randomly have similar spikes in write latency.

Examining write throughput and write operations per second shows no spikes in either metric during these events.  Likewise, read rate and read ops are low during these events.

zfs is configured with a Zeusram slog device.  The IOPs running through the Zeusram are not anywhere near their limit (the max I have seen is 500 write iops and the limit is in the 10s of thousands ).  In order to get around the vsphere single tcp connection limit for NFS shares, the illumos server is multihomed with several ip addresses and each datastore is tied to one of those ip addresses.  This prevents a single tcp connection for all VMs; each VM gets its own private tcp connection.  The omnios server has 32gb ram and 10tb of storage so there is not a RAM deficit.

I've tried tweaking multiple illumos kernel parameters, NFSd parameters, vmx config file parameters, and vsphere parameters, all to no effect.  I am unable to determine why these intermittent spikes are occurring.    

Admittedly, they are not very high latencies, but there has to be a reason for them.  It does not make sense that throughput and iops load should be low when these spikes occur.

Does anyone have any suggestions on where to look for the root cause?
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Aaron TomoskyDirector of Solutions ConsultingCommented:
Can you post arcstats output (I'm used to FreeBSD but I believe illumous has the same)?
Duncan MeyersCommented:
How many disk drives and what type of drives are in the storage system?
How many VMs in the environment?
How many users do you have?
michaelkim1Author Commented:
Here is the output from "arcstat 1":

output of "arcstat 1"
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

michaelkim1Author Commented:
I have 25 VMs running.  There are only 10 users.  There hard drives are Western Digital RE4 2TB.  The cache is a Intel SSD.  The slog device is a ZeusRAM 8GB.  Here is the zpool config:

zpool config
Paul SolovyovskySenior IT AdvisorCommented:
Are your VMs on local storage?  If so do you have a Battery back write cache controller?  We've seen weird issues in many environments that it wasn't installed on and configured properly.
Aaron TomoskyDirector of Solutions ConsultingCommented:
How big is the l2arc ssd? That's a different arc stats than I'm used to. I was hoping to see arcmetalimit and arcmetaused and evict stats... that sort of thing.
michaelkim1Author Commented:
I was able to figure this out.  Using a dtrace script to examine disk latencies showed the following:

The highest latency observed by the system from this slog device was 262ms, and occurred during a period of high write throughput.  The vast majority of the blocks written had latencies < 1ms, which is why my users were unable to notice any problems.

The way to fix this, I believe, would be to get a PCIE based slog or to have the slog devices behind a caching write controller with a battery backup.

Thanks for all your help.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Aaron TomoskyDirector of Solutions ConsultingCommented:
Remember that all the l2arc has to be referenced in ram and takes away from primary arc space. 32gb isn't that much for a zfs nfs server when you consider arc, l2arc, and metadata. Hopefully your aren't using dedupe as that uses a ton of ram.
michaelkim1Author Commented:
Thank you for the suggestion regarding the RAM and the l2arc.  I had already tried increasing the RAM to 128GB without any change in the intermittent spikes.
michaelkim1Author Commented:
Other suggestions were useful but did not help arrive at the solution.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.