Solved

VMware Degraded I/O with degraded virtual disk

Posted on 2013-10-31
3
1,362 Views
Last Modified: 2016-11-23
We have ESXi 5.0 running on a Dell PowerEdge R510 host with PERC 6/i RAID controller.  Eight physical disks configured as two virtual disks - one RAID 10 and one RAID 5.  Incidentally all 500 GB SAS (7.2K) drives.  VMware is installed on a flash drive.

We had a drive go into predicted fail a couple of weeks ago.  Initially not much impact at all, but it seems the drive has deteriorated further and although still not failed, disk I/O for the entire server has slowed to a crawl.  We had a web server VM hosting a small website with an instance of SQL Express and the website would timeout in most database connection attempts.  This VM was on the healthy RAID 10 VD.

The question is why would VM's on the other RAID 10 virtual disk be impacted by degraded state of the other virtual disk?  

In doing a little research, I read that if a "predicted failure" drive has a significant number of bad blocks, I/O performance can degrade while those blocks are marked bad.  So, we "offlined" the disk in question (reluctantly, knowing the risks) thinking that the drive was pretty much failed anyway.

That was more than 12 hours ago and still abysmal I/O on the entire server.

We have a replacement drive scheduled for delivery today, but I'm wondering if we need to be prepared for further corrective action.  Is this expected behavior or does it indicate further issues?  

We have Dell OMSA installed within ESXi and no other trouble is reported by the system.
0
Comment
Question by:gatorIT
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 120

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE^2) earned 500 total points
ID: 39613659
We've seen this, when RAID controllers, constantly try and read and write bad blocks.

This seems to take away overall performance from all good storage.

Get the disk replaced ASAP, when we see a disk go into predicted fail, we get it escalated to DELL/HP the very same day, for swap out, within 4 hours.
0
 
LVL 22

Expert Comment

by:Nick Rhode
ID: 39614099
Although different raid configs they are operated from the same controller.  Regardless of the defective disk (get replaced right away anyways) you will probably see the I/O error for I believe the PERC 6/i controller does not have a write cache so it reads and writes at the same time.  This will cause a performance issue which can be seen inside the vms with what seems like a short delay.  I recommend the H710 or higher for the PERC which has a write cache to resolve that issue.  After replacing the drive give a call to dell, they will most likely recommend a better raid controller to resolve those I/O issues.
0
 

Author Comment

by:gatorIT
ID: 39620228
PERC 6/i has write back cache and battery.  It's the SAS 6i/r that does not.  

Drive has been replaced, I think this will be the last RAID 5 array we ever use.  72 hours in we're still only at 75% in the rebuilding process.  The extra bit of storage from RAID 5 just isn't worth the performance hit and time to rebuild a small (relatively) 1.5 TB array.
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this article, I am going to show you how to simulate a multi-site Lab environment on a single Hyper-V host. I use this method successfully in my own lab to simulate three fully routed global AD Sites on a Windows 10 Hyper-V host.
Is your phone running out of space to hold pictures?  This article will show you quick tips on how to solve this problem.
This Micro Tutorial steps you through the configuration steps to configure your ESXi host Management Network settings and test the management network, ensure the host is recognized by the DNS Server, configure a new password, and the troubleshooting…
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question