EMC VNX5300 3 Trays experiencing forced flushing

Hello EE,

I need your help!  We recently had added a second tray to our EMC VNX 5300.  This one has a few SAS drives and less overall SATA space but allows the ability to fast cache, though I'm unsure how to confirm we're actively using.  

Anyhow I ran a NAR as my 3rd party software is reporting high I/O.  I heard back from EMC support that there are trespassed Pool LUNs, and forced flushing. Forced flushing is cause when by write cache being 100% full.  This is can happen at different times on different SPs,  but they can both have this problem.  When write cache is 100% full, the SP will do a "forced flush" of data in write cache to the drives.  While this is occurring, there will be high response times for all hosts performing writes on the CLARiiON storage system.

They offered me several articles with 50+ things it could be and rather than trying one then the other and so on wanted to see if anyone has experienced this and can offer any insight:

Software Block Version:
Free Raw Disk 2683.88 GB
Free Storage Pool 4169.13 GB
Used 14802.08 GB
First Enclosure 15 disk but disk 14 hot spare.  536.808 GB each RAID 5 SAS
Second Enclosure 15 disk 0 hot spare.  Disk 1-4 183.44GB RAID1 SATA FLASH.  Disk 5-9  536.808 GB Unbound and 10-14 RAID5 SAS.
Third Enclosure: All 536.808 with SAS RAID 5

I have 1 LUN primarily for all my data though a small one for ISO images and 1 pool with 7 Cisco UCS hosts.
Who is Participating?
Duncan MeyersConnect With a Mentor Commented:
Forced flushing is caused by a high workload and not enough drives to absorb that workload. The easiest fix (but probably most expensive fix) is to add 10 100GB SSD drives and FAST Cache software - the SSDs will absorb the peaks in workload.

Consider getting your array's FLARE updated to v32 (you're on 31 at the moment) as there's lots of enhancements in the newer FLARE

But first, you should identify which host is causing the peaks. You can use Unisphere Analyzer to do this, or you can use Properties of SPs and LUNs in Unisphere to keep an eye on workload. Analyzer gives you an incredibly granular view into what's going on in the array but you really have to understand what you're looking at. Monitor Throughput (IOPS) and Response Time (mS) on the SPs and LUNs. Response time will increase during workload peaks - start worrying once response time starts exceeding 20mS. You'll probably see response times in excess of 100mS during forced cache flushing.
You can also watch disk workload in VMware vCenter Server from the performance screens, which will also show you which VM is busiest - monitor Disk numberRead and numberWrite.


Once you've identified the host that's causing the peaks in workload, and you know how big those peaks are, you can decide your best course of action - whether that's to add more disks, change RAID type or add FAST Cache or FAST-VP.

My top tip, though, is if you find a SQL Server is generating the I/O peaks, try increasing RAM on the VM as this will allow SQL to keep more of the database in memory rather than having to write out to disk constantly - so if you have 4GB allocated, increase RAM to 8GB or even 16GB and monitor performance
bergquistcompanyAuthor Commented:
good information thanks
Duncan MeyersCommented:
Thanks. Glad I could help.

If you found my answer useful, perhaps you'd consider grading it as A rather than B? Could I suggest you review the grading guidelines?
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.