Performance degradation in new Hyper-V cluster
Posted on 2011-03-23
Setup: Dell T710 (purch Jan 2010) and Dell R710 (purch Jan 2011). Using a Dell MD3200i (iSCSI) as our SAN. 12 NICS on the T710, 8 on the R710 (got another quad port sitting on my desk, waiting for a maintenance window to install it). SAN direct attaches via CAT6 cables to the servers. SAN has dual raid controller cards.
Original VM environment ran purely on the Dell T710. It ran a RAID 5 array, about 1.5 TB of local storage, in addition to the RAID 1 system drive. Never had any performance issues.
Have about 5 networks dedicated for the cluster (and binding order)
1) Management NICS on each port
2) Direct connection to iSCSI port on SAN
3) Direct connection to ISCSI port on SAN (each server has dual connections, crossed, for MPIO purposes, on seperate subnets, both have jumbo frames enabled)
4) Cluster Heartbeat network (private network, 10.1.1.x)
5) Live Migration network (private network, 10.1.2.x, with jumbo frames enabled)
Rest of the NICs on the nodes are used for the virtual networks.
Have about a 2 GB LUN for the witness disk, and about 4 TB of storage for the VM's. Running Raid 6 on the SAN. 128 segment size
We started experiencing some massive slowdowns in one our SQL servers and a server that runs some CRM software, that uses one of the DB's on that SQL server in question. This CRM server is used by the sales force. The SQL server hosts the CRM database, as well as a enterprise database.
These slowdowns seem to have occured after we moved the VM's, via SCVMM, into the CSV storage.
I can't seem to find where the degredation is coming from. We've rebooted the SQL server often, and performance improves, only to degrade again within a few hours.
I tried to turn the VM off and copy the VHD files off back to local storage on the original host server, but the copy process just hangs after a few seconds, on either server. (I'm attributing this to it being a CSV?)
I've contacted Dell enterprise storage to check iSCSI initiator settings and it all seems good now. I've upgraded the drivers on the older server NICs.
Any ideas on things I can check? I'm kind of at a loss here, on why we're seeing issues.