asked on

Windows VM Disk errors nightly on ESXi 5.5 host

We have server monitoring in place and are receiving some serious disk errors nightly from the event logs. Running 3 Windows Server VMs (2008 R2 and 2012 R2) on an ESXi host. Errors occur on multiple VMs and seem to be caused by the backup job.

Errors are as follows:

Alert Name : New Technology File System (Ntfs) - The file system structure on disk is corrupt and unusable.
Alert Name : Disk - Device Not Ready for Access.

We have an ESXi 5.5 host running on a Dell PowerEdge R510 with a Xeon E5530 and 32GB RAM. I believe it is a RAID1 array with 2x 1TB 7200rpm near-line SAS drives and a SAS 6/iR controller.

Backups are running nightly and start at 11pm using Veeam 8.0. I'm thinking this might be snapshot related.

Any advice on how to stop these errors?

Many thx,
-Steve
ScreenShot260.jpg
ScreenShot259.jpg

ASKER CERTIFIED SOLUTION

Robin CM

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Your backups are causing a "broadcast storm of the datastore" because of low IOPS.

Reduce the number of concurrent VMs per job, or add more disks to your RAID set.

More Spindles = More Disks = More Performance = More IOPS.

Seriously two disks in RAID 1 for a hypervisor ?

Roy Bene

Hi, Steve:

Wow... a RAID 1 for ESX? Did this just start happening and backups were running fine before a certain point in time?

Couple things to try and rule out before attributing things automatically to network:

1. I've had something similar to this occur using Veeam before and it was, indeed, due to snapshots. Veeam should automatically delete snapshots if you have it set up correctly but, on your next round of backups, try manually deleting the snapshots before backup. I usually make sure (in Snapshot manager) that all my snapshots are gone.
2. Not sure how VM-heavy you are, but I would try getting a group (5-10) VM's that are giving you trouble and backing them up off-schedule, separately, to see if the issue persists.

Let me know.

-R

servicad

ASKER

Hi Andrew and roycbene,

@andrew:
1. thanks for the explanation, this is very helpful
2. parallel processing is already disabled in Veeam so they should be handled sequentially if my understanding is correct on this
3. thank you for this, will consider adding HDDs for more iops

@roycbene:
1. the errors have been generating for quite a long time (months) but backups are running successfully
2. all snapshots are gone, none are present in snapshot manager prior to the Veeam job
3. pretty low VM load, only 3 VMs with a small user base, think this is likely an IOPS issue as mentioned by Andrew

I'm wondering, are there settings in Veeam 8 to help lower the load on the host? Thank you both.

servicad

ASKER

@robincm

Sorry I missed your initial response, thanks for this. We are familiar with and enjoy QNAP products.

Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)

Veeam Support will also state that if you have slow datastores, it will cause issues with snapshots also.

Make sure none of your VMs are running on snapshot delta children!

piedthepiper

We had an issue where backup datastores were running on 7200rpm disk and had similar issues and just long backups, so we set it to use the 15k drives initially and then tiered down over a period of time. This cured the problem, but the problem as everyone else has stated is the slow drives!