avitman
asked on
How do I safely get Hyper V VM's out of a failover cluster and onto a standalone hyper v server?
Hello,
I am having problems, major problems, with a new ish windows cluster. It is only used for Hyper V to make our VM's highly available.
I have 3 nodes each identical in hardware (with the exception of 1 which i have added an additional network card, the plan is to to this to all 3).
The 3 nodes are connected by iSCSI to a Dell md3000i SAN. each node uses 2 NIC ports for iscsi traffic and i have 2 redundant switches in place.
My problems seem to have started after having added the 3rd node to the cluster and changed the quorum to node majority. prior to this it was node and disk majority.
Earlier today windows update must have rebooted 1 or more of the nodes (these should obviously not have been set to automatically update). Now we seem to be stuck in a sort of cycle where 1 or more of the nodes cannot reconnect to the iscsi disks.
It's fairly difficult to try and explain the symptoms.
So...
What i want to do is get rid of the failover clustering role BUT I have to keep the VM's safe. My plan is to take clustering back to a lab environment and do more testing before putting production servers in there.
i can see 2 options,
1. move the VHD files off to another hyper v server outside the cluster.
2. somehow strip the cluster layer away from the nodes leaving them as normal hyper v servers.
I think i'd prefer option 2 as the hardware is better, to be honest though i'm happy to try any suggestions.
thanks in advance,
Chris
I am having problems, major problems, with a new ish windows cluster. It is only used for Hyper V to make our VM's highly available.
I have 3 nodes each identical in hardware (with the exception of 1 which i have added an additional network card, the plan is to to this to all 3).
The 3 nodes are connected by iSCSI to a Dell md3000i SAN. each node uses 2 NIC ports for iscsi traffic and i have 2 redundant switches in place.
My problems seem to have started after having added the 3rd node to the cluster and changed the quorum to node majority. prior to this it was node and disk majority.
Earlier today windows update must have rebooted 1 or more of the nodes (these should obviously not have been set to automatically update). Now we seem to be stuck in a sort of cycle where 1 or more of the nodes cannot reconnect to the iscsi disks.
It's fairly difficult to try and explain the symptoms.
So...
What i want to do is get rid of the failover clustering role BUT I have to keep the VM's safe. My plan is to take clustering back to a lab environment and do more testing before putting production servers in there.
i can see 2 options,
1. move the VHD files off to another hyper v server outside the cluster.
2. somehow strip the cluster layer away from the nodes leaving them as normal hyper v servers.
I think i'd prefer option 2 as the hardware is better, to be honest though i'm happy to try any suggestions.
thanks in advance,
Chris
I would troubleshoot the MD3000i and the iSCSi settings on the host servers first to make sure everything is configured properly on them:
- Use the modular disk configuration utility to configure the servers as oppose to manually setting up the ‘iSCSi initiator’ yourself.
- Make sure that you see multiple entries under 'iSCSi Initiator -> 'Favorite Targets' or your iSCSi is not set up properly on the servers. Remove and reconfigure if this is the case
- Update the MD3000i with the latest firmware if not done already
- In MD Storage Manager, make sure that you have all 3 servers in one 'Host Group' and the HBAs are entered properly in the represented servers. Also make sure the host type says 'Windows Server 2003/Server 2008 Clustered) in MD Storage Manager -> Host Group -> Host -> Host Type.
VH
- Use the modular disk configuration utility to configure the servers as oppose to manually setting up the ‘iSCSi initiator’ yourself.
- Make sure that you see multiple entries under 'iSCSi Initiator -> 'Favorite Targets' or your iSCSi is not set up properly on the servers. Remove and reconfigure if this is the case
- Update the MD3000i with the latest firmware if not done already
- In MD Storage Manager, make sure that you have all 3 servers in one 'Host Group' and the HBAs are entered properly in the represented servers. Also make sure the host type says 'Windows Server 2003/Server 2008 Clustered) in MD Storage Manager -> Host Group -> Host -> Host Type.
VH
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I will start the cluster again from scratch just as soon as i get the last of the vm's out. this is proving to be a challenge as the disk's keep dropping offline!
There are a couple of config changes I will make on 'attempt 2' including manually setting the speed and duplex on all NIC's. Also, MS support advised I should have a redundant and dedicated heartbeat network between the nodes, despite other sources (including MS kb articles) claiming that heartbeat networks are a thing of the past in 2008 R2. I'll add one in any case.
My feeling is that this is a disk presentation issue, I'm seeing the iscsi disks dissappear quite regularly.
I have also been advised that Hyper V Server 2008 R2 might be a better option than full blown server 2008 r3 enterprise on the physical cluster nodes. I'm not sure exactly of the best way in that case to connect the nodes to the SAN, i guess I would have to manually create the connections in iscsi initiator instead of using the dell utility.
There are a couple of config changes I will make on 'attempt 2' including manually setting the speed and duplex on all NIC's. Also, MS support advised I should have a redundant and dedicated heartbeat network between the nodes, despite other sources (including MS kb articles) claiming that heartbeat networks are a thing of the past in 2008 R2. I'll add one in any case.
My feeling is that this is a disk presentation issue, I'm seeing the iscsi disks dissappear quite regularly.
I have also been advised that Hyper V Server 2008 R2 might be a better option than full blown server 2008 r3 enterprise on the physical cluster nodes. I'm not sure exactly of the best way in that case to connect the nodes to the SAN, i guess I would have to manually create the connections in iscsi initiator instead of using the dell utility.
If your cluster was working OK at some point in time then there must be some configuration issue with it.
I am assuming you are using CSVs for VM storage. Are all online from each node? Are there any in 'redirected access' mode?
I would advise to look at the following link if there is some problem with CSV.
Troubleshooting ‘Redirected Access’ on a Cluster Shared Volume (CSV)
How to fix CSV stuck in redirected mode
I am assuming you are using CSVs for VM storage. Are all online from each node? Are there any in 'redirected access' mode?
I would advise to look at the following link if there is some problem with CSV.
Troubleshooting ‘Redirected Access’ on a Cluster Shared Volume (CSV)
How to fix CSV stuck in redirected mode
ASKER
Thanks this is what I did in the end.
ASKER
I will continue to troubleshoot the cluster but in a test environment. The theory behind the technology is great and worth getting right.
My current thinking is that the problem lies in the iSCSI setup. I noticed that every time the nodes went down I could RDP into them but the iSCSI disks were missing from disk manager.