saphico
asked on
windows 2003 low disk performance on machine with VSS problem
Hello,
We are having disk performance problems on several of our Windows 2003 virtual machines.
When performing a file copy on that machine (400Mb), the average disk queue length goes to 100% and even when the file is fully copied, it stays at 100% for (about) 10 additional seconds.
During this file copy, the entire virtual machine is slowing down.
As said, we have this issue on several machines who are running Windows 2003 R2, both in x86 or x64 architecture.
File copying is also reasonably slower on these machines then on other machines in the same environment (EXS 5.1)
Coincidence or not, but on all of these machines, we have VSS problems and are unable to perform backups using VSS (although the service is running).
Does anyone have a clue on how to solve this issue? Can anyone tell us if both problems can be related?
Please advise.
We are having disk performance problems on several of our Windows 2003 virtual machines.
When performing a file copy on that machine (400Mb), the average disk queue length goes to 100% and even when the file is fully copied, it stays at 100% for (about) 10 additional seconds.
During this file copy, the entire virtual machine is slowing down.
As said, we have this issue on several machines who are running Windows 2003 R2, both in x86 or x64 architecture.
File copying is also reasonably slower on these machines then on other machines in the same environment (EXS 5.1)
Coincidence or not, but on all of these machines, we have VSS problems and are unable to perform backups using VSS (although the service is running).
Does anyone have a clue on how to solve this issue? Can anyone tell us if both problems can be related?
Please advise.
ASKER
Hi Steve,
These settings were already disabled on all of our servers....
Kind regards,
Bert
These settings were already disabled on all of our servers....
Kind regards,
Bert
Not that then. :-)
What is the underlying storage are all the badly performing machines on the same physical device, the same physical spindles and/or the same LUN?
Steve
What is the underlying storage are all the badly performing machines on the same physical device, the same physical spindles and/or the same LUN?
Steve
ASKER
the underlying storage is a nexenta, with HA activated
the machines are spread out over several different lun's and physical spindles
we have other machines running on these same disks (connect with NFS) that are not having problems...
the machines are spread out over several different lun's and physical spindles
we have other machines running on these same disks (connect with NFS) that are not having problems...
This can be caused by slow datastores.
So what technology and how is your datastore configured?
RAID type, SATA or SAS, speed of disks, etc
Non of these VMs suffering this issue, have SNAPSHOTS?
So what technology and how is your datastore configured?
RAID type, SATA or SAS, speed of disks, etc
Non of these VMs suffering this issue, have SNAPSHOTS?
ASKER
datastore is configured in Raid 10 with both sas and SSD disks
in none of these vm's i'm able to perform a vss snapshot. the vss software simply hangs when performing the snapshot.
in none of these vm's i'm able to perform a vss snapshot. the vss software simply hangs when performing the snapshot.
Do you mean, when you select Take Snapshot, it does not completed?
and you also using the quiece option? e.g. it's ticked?
This could be VMware Tools or VSS writer issue
and you also using the quiece option? e.g. it's ticked?
This could be VMware Tools or VSS writer issue
ASKER
take snapshot is not completing, that is correct. the window just freezes...;
this is not vmware reltated, because we have had this problem also before (when this machine was running in a Xenserver environment)
this is not vmware reltated, because we have had this problem also before (when this machine was running in a Xenserver environment)
There is something like a 20 second window between VSS freezing the server and Vmware taking a snapshot that can then be backed up. If the Vmware snapshot doesn't happen within 20 seconds then the VSS backup fails. This suggests that the VSS issue is related to the slow disc performance.
What monitoring tools do you have for you disc subsystem? ESXTOP will measure disk latency from the hosts perspective.
Steve
What monitoring tools do you have for you disc subsystem? ESXTOP will measure disk latency from the hosts perspective.
Steve
any errors in the VMware Event viewer, or in the VM Event log?
ASKER
As i said before, we don't think this is storage or ESX related as other machines work perfecly and are running at full performance.
We really think this is a windows 2003 issue.
We really think this is a windows 2003 issue.
Re-install VMware Tools.
ASKER
can you tell me why you think this is vmware tools related?
As i said before, we also had this problem BEFORE we migrated the VM's from Xenserver to ESX....
As i said before, we also had this problem BEFORE we migrated the VM's from Xenserver to ESX....
So these Windows 2003 VMs had the same issue under Xen?
If so VSS is broke!
VMware Tools, and the Snapshot Quiece option, instructs the Sync driver in VMware Tools to q. the VM, using VSS.
If so VSS is broke!
VMware Tools, and the Snapshot Quiece option, instructs the Sync driver in VMware Tools to q. the VM, using VSS.
ASKER
Yes, i had the same problem in Xen. how can i repair VSS?
VSS is somewhat difficult to repair, if it has broke.
Do you have an errors in the event log which relates to VSS?
Do you have an errors in the event log which relates to VSS?
ASKER
we've already tried to solve the problem of vss, but haven't found a solution.
Can vss being broke cause system latency?
Can vss being broke cause system latency?
ASKER
hi. to exclude storage being an issue, we've just moved this vm to esx local storage.
We still have the same issue....
anyone?
We still have the same issue....
anyone?
Any events in the local event log of the Windows 2003 VM, at the time of VMware Snapshot?
ASKER
when i try to perform a shadow copy i get:
failed to create a shadow copy of volume c:\
Error 0x80042306: the shadow copy provider has an error. Please see the system and application event logs for more information
In the application logs i've found:
event id: 12310 on Source VSS
With description:
Volume Shadow Copy Service error: The shadow copy could not be committed - operation timed out. Error context: DeviceIoControl(\\?\Volume {e66f9797- 8d77-11e0- aab0-806e6 f6e6963} - 0000018C,0x0053c010,00037D 00,0,00038 D08,4096,[ 0]).
AND
event id: 12298 on source VSS
Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume \\?\Volume{e66f9797-8d77-1 1e0-aab0-8 06e6f6e696 3}\. The volume index in the shadow copy set is 0. Error details: Open[0x00000000], Flush[0x00000000], Release[0x80042314], OnRun[0x00000000].
In the system logs i've found
event id: 8 on source Volsnap
The flush and hold writes operation on volume C: timed out while waiting for a release writes command.
i hope this is helpfull...
failed to create a shadow copy of volume c:\
Error 0x80042306: the shadow copy provider has an error. Please see the system and application event logs for more information
In the application logs i've found:
event id: 12310 on Source VSS
With description:
Volume Shadow Copy Service error: The shadow copy could not be committed - operation timed out. Error context: DeviceIoControl(\\?\Volume
AND
event id: 12298 on source VSS
Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume \\?\Volume{e66f9797-8d77-1
In the system logs i've found
event id: 8 on source Volsnap
The flush and hold writes operation on volume C: timed out while waiting for a release writes command.
i hope this is helpfull...
Okay, this was what I was referring to, as VSS broken.
In the past we've copied and pasted VSS registries keys from another working system.
Some times, this has been successful, and sometimes not.
In the past we've copied and pasted VSS registries keys from another working system.
Some times, this has been successful, and sometimes not.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
we've simple swapped to a windows 2012 environment which solve this issue
Turn the SNP features off on one of the servers exhibiting the problem and repeat the problem transfer.
http://support.microsoft.com/kb/948496
Steve