Issues with VSAN when updating from esxi5.5 to esxi6


I we have a vmware estate of 6 server what had a vsan, each host having 2 disk groups made up of 1 X SSD and 3 X HDD

after I upgraded the first host to version 6 I got an error that the hosts could not communicated with all host in the cluster and on further investigation I can see under the vsan config that that 5 of the hosts are on "network partition group 1", while the upgraded host is on group 2.

I don't know why it happened or how to change it back to group 1? Any ideas
LVL 16
Aaron StreetInfrastructure ManagerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Build numbers before and after please?
Aaron StreetInfrastructure ManagerAuthor Commented:
(Updated) ESXi-5.5.0-20150104001-standard

(Updated) ESXi-6.0.0-20150704001-standard

This has got a bit more serious.

So I have 6 hosts, 5 still at 5.5 and 1 6, but I see them in network partition groups 1,2,3,4,5,6 so none of the vsan is working. But when i carry out tests all the hosts can see multicast from each other.

So the whole vsan is down! and I cant why?
Please post build number that you see in vsphere client. There are known bugs and might happen your upgrade killed the machine and you need to do a clean reinstall.
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Aaron StreetInfrastructure ManagerAuthor Commented:
These are the build numbers from vspher of host build numbers?

I worked out the issue, vsan does not like esxi5.5 and esxi6 hosts in the same cluster for vsan. removing the esxi6 hosts out of the cluster and it all came back to life.
Not your build was recalled:

ESXi 5.5 Patch Release ESXi550-201504002 (build number: 2702864) was recalled to resolve an issue described in After upgrading to VMware ESXi 5.5 Patch Release ESXi550-201504002, virtual machines using VMware NSX for vSphere 6.x or Cisco Nexus 1000v are unable to communicate across hosts (KB 2116370).

Best is to contact vmware support so they give recipe to resolve problems.
Aaron StreetInfrastructure ManagerAuthor Commented:
no because I was not on that 5.5 build (see my previous post).

its a known issue when mixing 5.5 and 6 hosts in to a single vsan cluster. ESXi6 has a new method for intrahost communications and if you add a version 6 host to a 5.5 cluster it casues the exact issue i was facing (hosts in different network partitions), VMwares solution is either upgrade all hosts to 6, or remove version 6 hosts from the cluster.

The issue was that I removed the host from the cluster, up dated it, added a new disk group and then added it back. Supposedly if you leave it in the cluster when you update and don't add a new disk group its fine and you can update the vsan to version 2 later. But if you add a new disk group to a esxi6 host it is automatily a version 2 cluster and then you have issues in a mix host environment.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Yess, but transition should not be killing VMs.
Aaron StreetInfrastructure ManagerAuthor Commented:
The issue takes the VSan off line,
VM's are using the data store for them VMDK,
can't access VMDK,
VM wont run.

What seems to be the nail in the coffin is that I had 6 hosts each with 2 Disk groups in the VSAN, When I upgraded it was OK, but I have also added another SSD and 3 X HDD to each server. When I created a 3rd disk group on the host I upgraded to version 6, this gets added not as a VSAN version 1 group, but as VSAN  v2. This created a mixed VSAN cluster and as VMware report if you add a esxi6/vsan2 host in to an existing esxi5.5 vsan1 cluster it will break the vsan and take it off line.

My plan was,

uprade each host in cluster,
add new disk group
once all hosts upgraded, upgrade the vsan from version 1 to 2.

What I should have done is

upgrade each host in cluster
upgrade vsan to version 2 once all hosts are upgraded
add new disk groups.

or I could have done

add new disk groups,
upgrade hosts
up grade vsan

what i effectively did is had multiply hosts in a vsan some running version 1 and others running version 2.
Probably not a big problem if you have storage vmotion available....
Aaron StreetInfrastructure ManagerAuthor Commented:
How would storage vmotion help? for that to work the storage you are moving from has to be up? If a storage goes of line then storage vmotion cant do any thing. If you have Site recover manager running and have the data replicated to a second storage you are OK.

But I am not sure how storage vmotion would help in this case?
As a one-way pass to VSA2 and ESXi6 ?
Aaron StreetInfrastructure ManagerAuthor Commented:
Have you ever had experince of using VMWare 5.5 and 6, and VSAN 1 and 2 in a live environment? I know your trying to help but nothing you have posted so far is related in any way to the original  question.
Aaron StreetInfrastructure ManagerAuthor Commented:
The issue was tracked down to a vmweare knowledge base listing this as a know issue when adding esxi6 host to a esxi5 cluster.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.