arghosrho
asked on
live migration failing windows server 2019 hyper-v cluster
we have a 2 node hyper-v cluster with a shared (direct storage spaces) storage. things function normally very well and everything is up and green.
when we try to do a quick migration or a migration when the machines are turned off everything runs fine and smooth
however when we try to do a live migration it always fails with these two error messages
ID 1205
The Cluster service failed to bring clustered role 'VM01' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
ID 1069
Cluster resource 'Virtual Machine Configuration VM01' of type 'Virtual Machine Configuration' in clustered role 'VM01' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
when i do the Get-ClusterResource everything seems online and no errors
please help
when we try to do a quick migration or a migration when the machines are turned off everything runs fine and smooth
however when we try to do a live migration it always fails with these two error messages
ID 1205
The Cluster service failed to bring clustered role 'VM01' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
ID 1069
Cluster resource 'Virtual Machine Configuration VM01' of type 'Virtual Machine Configuration' in clustered role 'VM01' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
when i do the Get-ClusterResource everything seems online and no errors
please help
You need to start by looking at the cluster logs to see which resource isn't being brought online.
As above, see this MS link on getting cluster logs via powershell:
https://docs.microsoft.com/en-us/powershell/module/failoverclusters/get-clusterlog?view=windowsserver2019-ps
https://docs.microsoft.com/en-us/powershell/module/failoverclusters/get-clusterlog?view=windowsserver2019-ps
ASKER
i managed to create logs but honestly, these are enormously big files! how can i put my hand on the pain point?
Find the block of time around the failure, could that part of the log.
Or zip it up and upload for others to review.
Or zip it up and upload for others to review.
ASKER
Hyper-v02Cluster.loghyperv-01Cluster.log
i have uploaded the logs with the relevant parts.
i hope someone can help me with this.
i tried to move other machines in the cluster and the live migration just works, its only the machiene named in the logs as app01 is the one that refuses
i have uploaded the logs with the relevant parts.
i hope someone can help me with this.
i tried to move other machines in the cluster and the live migration just works, its only the machiene named in the logs as app01 is the one that refuses
ASKER
when i check the resources Via PowerShell i get this