Looking to migrate your existing VMware environment to VMware Cloud Foundation? Below is a brief summary of what is involved in making this transition based on my personal experience.
When new technologies are released, many organizations look at how to implement them in their environments to improve performance, management, and agility. New developments that are software based can be much easier to implement in an existing environment versus having to deploy a full hardware and software stack. This is beneficial if an organization is between hardware refreshes or if the funding is not sufficient to purchase both the new software and hardware. However, converting an existing environment has its own unique challenges and complications.
For anyone not familiar, VMware Cloud Foundation (VCF) with NSX is a way to deploy a full Software Defined Datacenter from a single product. This includes your host deployments, clusters, and networks. It can be deployed in an on-premises datacenter, a cloud based environment, or a hybrid. When VCF was first released, it was only supported on VSAN. This limited the environments where it could be deployed. However, further development opened it up to environments with other storage options such as fiber channel, NFS, and vVols. This means that many existing environments can now be converted, opening the solution to many more organizations. The following is based on experience in converting an existing vSphere 6.7 environment to VMware Cloud Foundation 7.0 with NSX and some of the challenges that were encountered.
The first step in planning a conversion to VCF is to acquire the correct licensing. This can be challenging, so it is recommended to work with your account team and a trusted partner. This is not as simple as just upgrading your existing licenses from 6.7 to 7.0 as in previous releases. VCF is based on cloud technologies and has many features. Therefore, you need to either purchase new licenses or trade in your existing ones for VCF licenses. Your account team and partners need to know which functions and features of VCF that will be deployed to provide the correct licensing. Otherwise, you could run into issues during your deployment when you need to download various appliance OVA’s or enable specific features. Even once the new license have been issued, do a review to ensure that all of the features and functionality has been provided ahead in initiating your deployment. On premise licenses for the same features will not work if you are looking to enable certain cloud-based features. Log into your VMware Cloud portal and verify which subscriptions are licensed.
To ensure a smooth deployment, validate that all of the existing hardware is on the hardware compatibility list for VCF. In addition, while the product itself has been adapted to various forms of storage, the management plane is an exception. The management plane must consist of four VSAN compatible host (aka VSAN Ready nodes). As of this writing, this is a
. An existing environment may not either have this hardware in place, or if it does exist, not be able to remove them from production for this deployment. In either case, there may be a need to purchase hardware to be able to deploy the management plane. You will also need to be able to free up three existing hosts to create the first workload domain. This can be determined with a variety of tools, including VMware’s vRealize Operations. With this tool, you can perform “What-if” scenarios to ensure you can safely remove the needed hosts without causing a major impact to your environment. If there is not enough head room to remove three hosts, a temporary option, without purchasing new hardware, would be to “borrow” one or more hosts from a disaster recovery site. This would provide the needed hardware and could later be decommissioned out of VCF and re-installed at the remote site once the conversion is complete.
With the current supply chain issues, hardware needs could cause a major delay in the project. Some hardware vendors take three to six months on shipping this type of hardware. One solution is a trusted, used equipment vendors. Just supply the vendor with the hardware configuration for a VSAN Ready node and request pricing. I would recommend selecting a vendor who offers 24/7 support on these products. In most cases, manufacturers will not accept maintenance contracts on used hardware and you want to be able to get replacement parts in the event a host has issues. The pricing will also be significantly less than the cost of new hardware. Being able to use this route may depend on the organization. However, a used equipment vendor was the route I took for the management plane hosts and was able to acquire the needed hardware in a fraction of the time and costs of purchasing brand new hardware from a preferred manufacturer. Moreover, I have noticed that the supply chain issues are pushing the pricing up and availability down from these providers as well. So, make sure to verify costs and availability before ordering. Once all the hardware has been acquired and in place, it is recommended to perform a full firmware upgrade on all hardware, including the switches that will be used to integrate VCF with the rest of the network. This will bring the hardware up-to-date and address any issues that could arise from unpatched bugs in the firmware. BGP pairing due to a bug in the core switches was one of the issues we ran into during the deployment process. A code upgrade on all switches ahead of time would have avoided this issue.
Planning the deployment of VCF requires coordination of multiple teams. This is because VCF is a full datacenter platform of compute, network, and security systems. Therefore, members of the server, network, and security teams should be involved in deployment planning. VMware provides a multipage document located here:
Download and Complete the Deployment Parameter Workbook(VMware.com)
This provides samples of the server names and IP addresses that will be needed. Beside each of these are blank cells to enter the names and IP address for your environment. Once the document has been completed, each of the server names and IP addresses need to be added to DNS ahead of deploying the environment. The deployment tool performs reverse DNS checks to verify these server names exist ahead of deploying any of the appliances. If it cannot resolve a client, the deployment will fail. The planning guide also has a list of required user accounts and passwords. This is an excellent time to review and update the passwords for the VMware environment. While I recommend against placing username and passwords in a spreadsheet, the guide can be used to create usernames and passwords in a password management system. Using the same password for all accounts like root, admin, audit goes against good security practices. Therefore, I would recommend using different passwords for these accounts. There are matching accounts for the ESXi host, Global NSX Manager, Edge NSX Appliances, SDDC, vSphere, and others. With such a diverse list of accounts and passwords, I recommend a good, enterprise password manager with MFA to ensure these remain secure. These can all be added, then cut-and-pasted used during the deployment process. This helps keep the passwords and environment secure.
Once all of the hardware is in place, upgraded, and the planning document completed, it is time to begin the deployment. This is where it can get a little tricky. The entire process is too complicated to get into details, but here is a summary.
First, ESXi must be loaded on each of the hosts being used for the management domain, with the network and NTP settings configured. Next, Cloud Builder will be the first appliance deployed. It can be deployed in the existing environment as it can be decommissioned once the SDDC management plane has been deployed. Once it is deployed, you log into the web interface and begin the deployment process of the management plane. Here you select the platform, prerequisites, complete the questionnaire (hostnames, IP’s, etc), review the configuration, then deploy The Software Defined Data Center (SDDC) management system. The actual deployment process takes several hours. During this process, the hosts are reconfigured, VSAN deployed, vCenter and NSX Manager. It is good to monitor for any failures. Fortunately, it is designed so that if there is a failure, once corrective action is taken, you can simply have the deployment “retry” from the point it failed. The process will validate the issue has been resolved and pick up from where it stopped. There is no need to start over from scratch.
Once the management plane has been deployed, you can log into the SDDC Manager. The rest of the deployment takes place from this interface. Before VM’s can be migrated to VCF, a workload domain must be created. As previously mentioned, this will require three hosts to create. If they are being removed from the existing environment, you will place each one in maintenance mode, disconnect from vCenter, then remove from inventory. Depending on the configured storage, you will want to remove all storage LUN’s except the boot disk. This prevents any accidental access to LUN’s where production VM’s may be running. The host must then be reloaded with the ESXi version for VCF. Once the ISO has booted, ensure the boot disk is the only storage the install sees. This is a secondary precaution to protect existing systems. Once loaded, as in the case with the management hosts, the network and NTP settings need to be configured before provisioning them in VCF.
Within SDDC Manager, you will create the workload domain. As with the deployment of the management domain, there will be a similar questionnaire that needs to be completed with host names, IP addresses, usernames and passwords, along with other configuration questions. After this is completed, the deployment will take between one and two hours to fully deploy the workload domain. I recommend monitoring for any issues so that they can be addressed as quickly as possible. After the deployment shows it has completed, log into vCenter and verify the environment is fully deployed.
While the management and workload domains are deployed, the environment is still not ready to migrate or create virtual machines. The NSX workload domain needs to be deployed. As with the compute infrastructure for the workload domain, NSX for the workload domain will be deployed through the SDDC Manager. This is done through the API explorer and may require some experience with API commands to deploy this environment. This will be used to create NSX management cluster for the workload domain, which will then be used to create the NSX Edge network that will be used by the virtual machines. Much of this may require input or assistance from the networking team as there is close integration with the virtual and physical networks. Once the NSX networks are in place, I recommend creating a few virtual machines to test the network connectivity ahead of migrating production virtual machines.
From my perspective, there are two different ways of migrating the virtual machines from the old vCenter to VCF. The first is using HCX. This is a VMware tool and does require the right licensing to provision it. This can be used to migrate between on-premise environments as well as VMware Cloud instances. It provisions a series of management and working virtual machines in both environments. The advantage to HCX is that it can live vMotion the virtual machines from one environment to the other with a loss of only one to two pings, similar to vMotioning virtual machines between hosts. This is also a great tool if you are wanting to make changes during the migration. For example, you have your current virtual machines thick provisioned and you want to convert them to thin to save storage cost. You can do both individual or bulk migrations and schedule the cutovers in a maintenance window just in case you want to avoid even the possibility of an issue to end users. The downside to HCX is that if you are converting an existing environment, it does not recognize the storage is the same. This means a full copy of the VMDK has to be made as part of the migration. Which is fine if you are wanting to do something like go from thick to thin, but can be a long migration if it is a 1-2TB virtual machine.
The second option is much quicker, but requires the virtual machine to be shut down and there to be shared storage between the environments. I find this much quicker when large virtual machines and short maintenance windows are involved. To do this, you power off the virtual machine and remove it from inventory. In the VCF environment, you go to that specified LUN, browse to the folder for the virtual machine, find the VMX file, select it, then click “Register VM”. Next, one must go through the registration process and place it in the correct folder if applicable. Then, one must edit the network to the correct switch. Finally, one must power on the virtual machine. It is a good idea to initiate a constant ping command and verify the virtual machine has connectivity. It is also a good idea to log into the virtual machine and validate the operating system functions as expected, at least on the first two or three that are migrated.
Virtual machines with RDM’s add another layer of complexity. While less common today, these are typically only used with Windows cluster environments. There may still be some existing legacy systems in an environment that utilizes these type of disks. These cannot be migrated to a new storage. However, in a shared storage the virtual machines that use them can be migrated. To do this, the virtual machine must be shut down and the RDM disconnected. One needs to make sure to either take a screenshot or note the UUID of the disk before disconnecting. Next, one should migrate the virtual machine using either of the two methods mentioned above, then reattach the RDM to the virtual machine and power on. In the event these are part of a Windows cluster, I recommend doing the passive node first, performing a failover, and then moving the second node. This process minimizes an outage to just the failover period and validates that the cluster is working as expected.
Once migrated to VCF it is a good idea to update both the tools and the hardware compatibility. With VCF the Lifecycle Manager can be used to accomplish both tasks. It can be done immediately or scheduled during another maintenance window. This upgrade will ensure the best performance for the virtual machines as both improve performance and compatibility with the rest of the environment. As virtual machines are migrated, additional hosts can be freed up. The pattern will continue the process of migrating some virtual machines to VCF, freeing up additional host. Then, one will remove and reload a host, add that host to VCF, until all needed virtual machines have been migrated. When one is down to one host, the final steps will be to migrate the network extension into VCF (which requires about a one minute outage), the HCX environment on the old vCenter will need to be decommissioned along with the old vCenter server. This will leave one last host to be migrated before finalizing the conversion. If any hosts were borrowed from another site to have enough resources for the conversion, then those hosts need to be decommissioned through the SDDC Manager out of VCF and returned to the proper environment.
If given a choice of converting an existing environment or deploying a new greenfield environment, the simplest option is the greenfield. Converting an existing environment adds another level of complexity to the deployment. However, since this may not be possible for several reasons, converting an existing environment to VCF is a doable task. It will require extensive planning, preparation, and possibly the addition of compute/storage resources. Each environment and organization is different, so it requires a full evaluation. It is also best to keep your account team and a good, experienced VCF partner involved to ensure a smooth transition with minimal to no downtime. One should complete the planning document, purchase the correct licensing, have four VSAN Ready nodes and three ESXi hosts available, and you are ready to go. This should give one the tools for success.