Windows VM Disk errors nightly on ESXi 5.5 host

We have server monitoring in place and are receiving some serious disk errors nightly from the event logs. Running 3 Windows Server VMs (2008 R2 and 2012 R2) on an ESXi host. Errors occur on multiple VMs and seem to be caused by the backup job.

Errors are as follows:

Alert Name : New Technology File System (Ntfs) - The file system structure on disk is corrupt and unusable.
Alert Name : Disk - Device Not Ready for Access.

We have an ESXi 5.5 host running on a Dell PowerEdge R510 with a Xeon E5530 and 32GB RAM.  I believe it is a RAID1 array with 2x 1TB 7200rpm near-line SAS drives and a SAS 6/iR controller.

Backups are running nightly and start at 11pm using Veeam 8.0. I'm thinking this might be snapshot related.

Any advice on how to stop these errors?

Many thx,
-Steve
ScreenShot260.jpg
ScreenShot259.jpg
servicadAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Robin CMSenior Security and Infrastructure EngineerCommented:
Probably get more performant storage! 7200rpm disks will give you very poor performance, not much else out there is slower.
The snapshot and backup operations are very IO-intensive, and are getting the host to do a lot of IO of its own. This will easily saturate the drives and prevent the VMs themselves from getting a look in.

How you fix this is down to how much money you have, and how good you want the performance to be. e.g. an option that you might want to consider is buying a something like a QNAP NAS and running iSCSI (dedicated NIC in the host, or two) with several SSDs configured with RAID5. e.g. the TS453S Pro (https://www.qnap.com/i/uk/product/model.php?II=152) which supports VAAI etc. and is "VMware Ready" (https://www.qnap.com/i/uk/business_solutions/con_show.php?op=showone&cid=6).

"Other NAS providers are available" ;-)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Your backups are causing a "broadcast storm of the datastore" because of low IOPS.

Reduce the number of concurrent VMs per job, or add more disks to your RAID set.

More Spindles = More Disks = More Performance = More IOPS.

Seriously two disks in RAID 1 for a hypervisor ?
Roy BeneVP/Director - IT | ISOCommented:
Hi, Steve:

Wow... a RAID 1 for ESX? Did this just start happening and backups were running fine before a certain point in time?

Couple things to try and rule out before attributing things automatically to network:

1. I've had something similar to this occur using Veeam before and it was, indeed, due to snapshots. Veeam should automatically delete snapshots if you have it set up correctly but, on your next round of backups, try manually deleting the snapshots before backup. I usually make sure (in Snapshot manager) that all my snapshots are gone.
2. Not sure how VM-heavy you are, but I would try getting a group (5-10) VM's that are giving you trouble and backing them up off-schedule, separately, to see if the issue persists.

Let me know.

-R
servicadAuthor Commented:
Hi Andrew and roycbene,

@andrew:
1. thanks for the explanation, this is very helpful
2. parallel processing is already disabled in Veeam so they should be handled sequentially if my understanding is correct on this
3. thank you for this, will consider adding HDDs for more iops

@roycbene:
1. the errors have been generating for quite a long time (months) but backups are running successfully
2. all snapshots are gone, none are present in snapshot manager prior to the Veeam job
3. pretty low VM load, only 3 VMs with a small user base, think this is likely an IOPS issue as mentioned by Andrew

I'm wondering, are there settings in Veeam 8 to help lower the load on the host? Thank you both.
servicadAuthor Commented:
@robincm

Sorry I missed your initial response, thanks for this. We are familiar with and enjoy QNAP products.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Veeam Support will also state that if you have slow datastores, it will cause issues with snapshots also.

Make sure none of your VMs are running on snapshot delta children!
piedthepiperCommented:
We had an issue where backup datastores were running on 7200rpm disk and had similar issues and just long backups, so we set it to use the 15k drives initially and then tiered down over a period of time. This cured the problem, but the problem as everyone else has stated is the slow drives!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.