Link to home
Start Free TrialLog in
Avatar of avitman
avitman

asked on

How do I safely get Hyper V VM's out of a failover cluster and onto a standalone hyper v server?

Hello,
I am having problems, major problems, with a new ish windows cluster.  It is only used for Hyper V to make our VM's highly available.

I have 3 nodes each identical in hardware (with the exception of 1 which i have added an additional network card, the plan is to to this to all 3).

The 3 nodes are connected by iSCSI to a Dell md3000i SAN.  each node uses 2 NIC ports for iscsi traffic and i have 2 redundant switches in place.

My problems seem to have started after having added the 3rd node to the cluster and changed the quorum to node majority.  prior to this it was node and disk majority.

Earlier today windows update must have rebooted 1 or more of the nodes (these should obviously not have been set to automatically update).  Now we seem to be stuck in a sort of cycle where 1 or more of the nodes cannot reconnect to the iscsi disks.

It's fairly difficult to try and explain the symptoms.

So...
What i want to do is get rid of the failover clustering role BUT I have to keep the VM's safe.  My plan is to take clustering back to a lab environment and do more testing before putting production servers in there.

i can see 2 options,

1. move the VHD files off to another hyper v server outside the cluster.
2. somehow strip the cluster layer away from the nodes leaving them as normal hyper v servers.

I think i'd prefer option 2 as the hardware is better, to be honest though i'm happy to try any suggestions.

thanks in advance,

Chris
Avatar of avitman
avitman

ASKER

So I decided shortly after asking ths question to start moving machines out of the cluster to a different non clustered Hyper V server.  Shut down each VM, then copy and paste the vhd file to the new server and attach it to a newly created VM.  I did have some problems along the way because the cluster disks kept moving between nodes when they failed over between each other.  I also had to do a small amount of reconfiguring on each newly created/moved VM such as re-assigning an IP address as they all defaulted to DHCP.

I will continue to troubleshoot the cluster but in a test environment.  The theory behind the technology is great and worth getting right.

My current thinking is that the problem lies in the iSCSI setup.  I noticed that every time the nodes went down I could RDP into them but the iSCSI disks were missing from disk manager.
I would troubleshoot the MD3000i and the iSCSi settings on the host servers first to make sure everything is configured properly on them:

- Use the modular disk configuration utility to configure the servers as oppose to manually setting up the ‘iSCSi initiator’ yourself.  

- Make sure that you see multiple entries under 'iSCSi Initiator -> 'Favorite Targets' or your iSCSi is not set up properly on the servers. Remove and reconfigure if this is the case

- Update the MD3000i with the latest firmware if not done already

- In MD Storage Manager, make sure that you have all 3 servers in one 'Host Group' and the HBAs are entered properly in the represented servers. Also make sure the host type says 'Windows Server 2003/Server 2008 Clustered) in MD Storage Manager -> Host Group -> Host -> Host Type.

VH
ASKER CERTIFIED SOLUTION
Avatar of msmamji
msmamji
Flag of Pakistan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of avitman

ASKER

I will start the cluster again from scratch just as soon as i get the last of the vm's out.  this is proving to be a challenge as the disk's keep dropping offline!

There are a couple of config changes I will make on 'attempt 2' including manually setting the speed and duplex on all NIC's.  Also, MS support advised I should have a redundant and dedicated heartbeat network between the nodes, despite other sources (including MS kb articles) claiming that heartbeat networks are a thing of the past in 2008 R2.  I'll add one in any case.

My feeling is that this is a disk presentation issue, I'm seeing the iscsi disks dissappear quite regularly.

I have also been advised that Hyper V Server 2008 R2 might be a better option than full blown server 2008 r3 enterprise on the physical cluster nodes.  I'm not sure exactly of the best way in that case to connect the nodes to the SAN, i guess I would have to manually create the connections in iscsi initiator instead of using the dell utility.
If your cluster was working OK at some point in time then there must be some configuration issue with it.
I am assuming you are using CSVs for VM storage. Are all online from each node? Are there any in 'redirected access' mode?
I would advise to look at the following link if there is some problem with CSV.
Troubleshooting ‘Redirected Access’ on a Cluster Shared Volume (CSV)
How to fix CSV stuck in redirected mode
Avatar of avitman

ASKER

Thanks this is what I did in the end.