Time for a VxRail Tech Refresh? The what and how.....

Luke Jones
Jul 19, 2022
5 min read

Recently VxRail has added a few new node types to the portfolio (well more than a few) and many of our early 13G VxRail nodes are due for tech refresh. With these two things in mind I have been finding myself often talking about the best strategy to replace ageing nodes in a VxRail cluster. In this article I am going to lay out a few options that cover the vast majority of cases. These options aim to reduce complexity and as a result risk and cost as much as possible.

Broadly there are two groups of options with a few different implementations depending on circumstances. There are:

Expanding (to add new nodes) and contracting (to remove old nodes).
Building a fresh cluster and performing a side-by-side migration.

I have ordered the options below into most recommended (blue) to least (orange), but all of these options are a good choice and recommended approach. I created the decision tree below to help with deciding which is the right option to go with. The options on the left represent the least complexity/risk/cost. I will lay out each of the methods with their requirements to help make this clearer.

Option 1 ( Add All new Nodes then Remove all Old nodes)

Option one is simple and presents the least risk and effort. Add all the new nodes to the cluster then remove the old ones. This is achieved by leveraging the VxRail built in automations for node addition and node removal. There is no disruption to running workloads and no added risk. VMs will be seamlessly migrated from the old nodes to the new ones. When the old nodes are removed from the cluster the VxRail automation will evacuate all data ensuring that storage policies are in compliance.

Requirements:

Sufficient rack space, power, cooling, network cables and network ports for all new nodes and old nodes at the same time.
New IP addresses and hostnames for all new nodes.New A and PRT records in DNS
Additional switch ports configured

Option 2 ( Add some Nodes, then Remove some Nodes)

Option two is very similar to option one above except that one node is added then one removed. This process is repeated until all nodes have been replaced.

There is slightly more risk involved in this process around the racking/de-racking and cabling of nodes since it will need to happen more often. This risk can be mitigated by following a careful process and maintaining neat cabling.

This option could be used when there is insufficient:

Rack space
Power/cooling
Network ports
Cables/transceivers
IP addresses/hostnames

Whilst one in and one out is a valid process it would be best to do this in the largest available number at one time. E.g. two in and two out.

Requirements:

Sufficient rack space, power, cooling, network cables and network ports for at least one more node than the current cluster size.
New IP addresses and hostnames for at least one more node.
New A and PRT records in DNS
Additional switch ports configured

Option 3 ( Remove One Node, Then Add One Node...)

Option three is again similar to both option one and two above. The difference here is that one node is removed before a new one is added.

This option introduces a new risk to the process. Removing a node before adding one reduces the failure domain and may necessitate changing storage policies before beginning. In the example image here dropping a four-node cluster down to three will only allow for one failure to tolerate and mirroring for the storage policy. Larger clusters will be less affected by this increased risk since removing a single node from a larger cluster represents a smaller percentage loss of resources.

This option makes sense when there is no spare:

Rack space
Power/cooling
Network ports
Cables/transceivers
IP addresses/hostnames

This process will be slower than option one and two and may have a negative impact to cluster performance.

Requirements:

Sufficient cluster capacity to operate with one node removed.
Sufficient cluster performance to operate with one node removed

Option 4 (Side by Side Migration using vMotion)

Option four presents a very different way of replacing ageing hardware. This is the first option that could also be used when migrating from non VxRail.

This option is a side-by-side migration. The new VxRail cluster will be built alongside the existing cluster managed by the existing vCenter server. The workload is then vMotioned from the old cluster directly to the new VxRail. This can be done with the VMs online and no downtime. vMotion network connectivity is required between all nodes in the old and new cluster. The clusters can be on different L3 networks as long as routing is configured on both clusters and any switches in the traffic path.

Once all the workload has been moved to the new cluster the old environment is decommissioned.

Requirements:

Sufficient rack space, power, cooling, network cables and network ports for all new nodes and old nodes at the same time.
New IP addresses and hostnames for all new nodes.
New A and PRT records in DNS
Additional switch ports configured
Both clusters are sharing the same vCenter Server
Both clusters vMotion interfaces can communicate (L2 or L3)
Both clusters use the same CPU vendor (Intel or AMD)
- Powered off vMotion can be used if changing CPU vendor

Option 4.1 (Side by Side using vMotion with new vCenter)

Option 4.1 Side by Side Migration with New vCenter

This is essentially the same process as option four except that the new cluster also has a new vCenter. The vMotions are set up using the import VM function in vCenter. The new vCenter will communicate with the old vCenter and set up the vMotion between two nodes in the respective clusters.

Enter in the old vCenters information and follow the wizard to import the VMs:

Enter in the old vCenters information and follow the wizard to import the VMs:

Option 4.2 (Side by Side using vMotion, Triggered by old vCenter)

This migration process can also be triggered from the old vCenter, the process is slightly different.

Right click on the VM to move and select Migrate:

Select Cross vCenter Server export. Follow the wizard to select the destination settings:

Requirements:

Sufficient rack space, power, cooling, network cables and network ports for all new nodes and old nodes at the same time.
New IP addresses and hostnames for all new nodes.
New A and PRT records in DNS
Additional switch ports configured
Both clusters vMotion interfaces can communicate (L2 or L3)
Both clusters use the same CPU vendor (Intel or AMD)
- Powered off vMotion can be used if changing CPU vendor

Option 5 (Side by Side Migration leveraging Replication)

Option 5: Side by Side Leveraging Replication

Option five is a similar idea to option four, it is a side-by-side migration; but this option doesn’t require direct connectivity between hosts. This option does require that there is a brief outage when the source VM is powered off and the destination VM is powered on.

The virtual machine is replicated from the source cluster to the new VxRail cluster using a replication tool. In the diagram above I have depicted vSphere Replication, but this could also be RP4VM (included with VxRail) or any other replication software. Once the data has been staged to the VxRail cluster the source VM is powered off, a final replication is completed to update the destination, then the destination VM is powered on.

Requirements:

Sufficient rack space, power, cooling, network cables and network ports for all new nodes and old nodes at the same time.
New IP addresses and hostnames for all new nodes.
New A and PRT records in DNS
Additional switch ports configured
Replication appliance installed, configured and operational
Network connectivity for the replication appliance to both clusters
VM outage for final migration steps