Best Practice with replacing a whole Server of a Cluster
just wanted to ask what the best practice for replacing the whole server (the whole raid) of one of the clusters server is.
Lets assume the following scenario:
2x Open-E Servers, iSCSi 2x 10GB replication in a bond, 2x 10GB to two different 10GB Switches for multipath iscsi to vmware.
One of the Open-E servers will get a brand new set of harddrives. Capacity doesn't matter, just the technique how to do this. We are planning to do this with 2x 10TB servers.
I already tested a quadrillion things with two test virtual machines with each 5gb of datastore.
Shutdown Server1 gracefully, delete 5gb datastore, put new one in, boot the vm.
As everything regarding bonding of the hosts and so on is stored on the raid nothing will function correctly until you remove the binding of the hosts and bind them again.
Means: shutdown virtual ips, shutdown vmware, shutdown all vms. wait for replication to fully complete, then reconfig cluster and move on. Offtime of about "i will get fired".
Everything to do with cluster maintanence mode. While this is nice do a only small things to the cluster, something like reconfiguring tasks/iscsi targets mapping to nodes is not working at all. Solution to first remove the host binding, shutdown, replace harddrive, boot up and redo everything is nearly the same downtime as 1)
Gracefully stop the cluster, remove host binding, shutdown server 1, replace harddrive. Meanwhile reconfigure iscsi targets to listen to the IP subnet of the physical nic instead of the virtual ip subnet (Only allowed subnet for iscsi targets are the iscsi multipathing subnets). Reconfigure vmware to use the physical ip of server2 instead of virtual ip. boot vmwares with only a small downtime of things said before (maybe 1-3 hours?) and start replicating! this will also take ages, but the vms will be online.
After this is finished hours/days later plan another downtime, stop cluster, reconfigure iscsi targets and tasks, start the cluster and boot the vms again.
Question regarding this: Do you have a chance in your admin connection to change the Allowed subnets for a iscsi target while the cluster is still running? I could then change the iSCSI Target IPs in my vmware cluster beforehand... this could mean I would have NO downtime! (I will for sure test this)
Is there something I have missed, what is the best practice for this? I initially thought this will be a much easier task, but it isn't. Mainly because absolutely nothing is working after replacing the datastore from open-e server 1.
I also opened a ticket for this, but cannot find it in the OTRS of you yet. Im sure I opened this today (16:30 ish?)
Will open another one now.
Last edited by danielweeber; 02-23-2017 at 10:01 PM.
Use the Maintenance mode, see this video it will for sure answer what you need to do, as we show with VMs (its faster then real machines for live Webinars) but we show the feature on then blowing away a V7 VM and introducing a new V7 VM easy.
Also I saw you had a ticket sent in yesterday about your ConnectX2 cards so not sure if that is the one but its the only from as of yesterday.
All the best,
that helped. Thanks. Our test in a few vms were working, too. Remove host binding while maintenance mode is on, cluster shutdown with vips enabled. Got it.
Just one additional question: After this procedure the tasks are named "incorrectly". Any way to fix this w/o interrupting?
(You have to create the new tasks on one server, because one server is source to both luns.)
Redo everything but leave source on one of each servers? Will the targets recognize the existing data with the metadata on the lun and NOT resync everything?
If you are adding a new server with new Volume Group and Logical Volumes then your will need to delete the tasks and re-create new tasks for the new volumes for that newer server and mark the volumes for Destination mode with the same size. When your start the new tasks then it will need to re-sync the date to the new volumes.
All the best,
yeah. I meant after adding the new tasks which you need for sure the tasks are named, esp, "mirror0" and "mirror1" and the server which hasn't been replaced and "mirror0_reverse" and "mirror1_reverse" on the new one which has been replaced.
Typically, in a active/active scenario, you create one task with one volume as source on one server, and one task on the other server with the other volume as source.
So on one server its tasks mirror0 and mirror1_reverse and on the other server its mirror0_reverse and mirror1