VMware Degraded I/O with degraded virtual disk
Posted on 2013-10-31
We have ESXi 5.0 running on a Dell PowerEdge R510 host with PERC 6/i RAID controller. Eight physical disks configured as two virtual disks - one RAID 10 and one RAID 5. Incidentally all 500 GB SAS (7.2K) drives. VMware is installed on a flash drive.
We had a drive go into predicted fail a couple of weeks ago. Initially not much impact at all, but it seems the drive has deteriorated further and although still not failed, disk I/O for the entire server has slowed to a crawl. We had a web server VM hosting a small website with an instance of SQL Express and the website would timeout in most database connection attempts. This VM was on the healthy RAID 10 VD.
The question is why would VM's on the other RAID 10 virtual disk be impacted by degraded state of the other virtual disk?
In doing a little research, I read that if a "predicted failure" drive has a significant number of bad blocks, I/O performance can degrade while those blocks are marked bad. So, we "offlined" the disk in question (reluctantly, knowing the risks) thinking that the drive was pretty much failed anyway.
That was more than 12 hours ago and still abysmal I/O on the entire server.
We have a replacement drive scheduled for delivery today, but I'm wondering if we need to be prepared for further corrective action. Is this expected behavior or does it indicate further issues?
We have Dell OMSA installed within ESXi and no other trouble is reported by the system.