Visit Open-E website
Results 1 to 6 of 6

Thread: Best Practice with replacing a whole Server of a Cluster

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1

    Default Best Practice with replacing a whole Server of a Cluster

    Hey Guys,

    just wanted to ask what the best practice for replacing the whole server (the whole raid) of one of the clusters server is.

    Lets assume the following scenario:

    2x Open-E Servers, iSCSi 2x 10GB replication in a bond, 2x 10GB to two different 10GB Switches for multipath iscsi to vmware.

    One of the Open-E servers will get a brand new set of harddrives. Capacity doesn't matter, just the technique how to do this. We are planning to do this with 2x 10TB servers.

    I already tested a quadrillion things with two test virtual machines with each 5gb of datastore.

    First test:
    Shutdown Server1 gracefully, delete 5gb datastore, put new one in, boot the vm.
    As everything regarding bonding of the hosts and so on is stored on the raid nothing will function correctly until you remove the binding of the hosts and bind them again.
    Means: shutdown virtual ips, shutdown vmware, shutdown all vms. wait for replication to fully complete, then reconfig cluster and move on. Offtime of about "i will get fired".

    Second test:

    Everything to do with cluster maintanence mode. While this is nice do a only small things to the cluster, something like reconfiguring tasks/iscsi targets mapping to nodes is not working at all. Solution to first remove the host binding, shutdown, replace harddrive, boot up and redo everything is nearly the same downtime as 1)

    Third test:

    Gracefully stop the cluster, remove host binding, shutdown server 1, replace harddrive. Meanwhile reconfigure iscsi targets to listen to the IP subnet of the physical nic instead of the virtual ip subnet (Only allowed subnet for iscsi targets are the iscsi multipathing subnets). Reconfigure vmware to use the physical ip of server2 instead of virtual ip. boot vmwares with only a small downtime of things said before (maybe 1-3 hours?) and start replicating! this will also take ages, but the vms will be online.
    After this is finished hours/days later plan another downtime, stop cluster, reconfigure iscsi targets and tasks, start the cluster and boot the vms again.
    Question regarding this: Do you have a chance in your admin connection to change the Allowed subnets for a iscsi target while the cluster is still running? I could then change the iSCSI Target IPs in my vmware cluster beforehand... this could mean I would have NO downtime! (I will for sure test this)


    Is there something I have missed, what is the best practice for this? I initially thought this will be a much easier task, but it isn't. Mainly because absolutely nothing is working after replacing the datastore from open-e server 1.

    Greets

    Daniel

    I also opened a ticket for this, but cannot find it in the OTRS of you yet. Im sure I opened this today (16:30 ish?)
    Will open another one now.
    Last edited by danielweeber; 02-23-2017 at 11:01 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •