Visit Open-E website
Results 1 to 3 of 3

Thread: volume replication

  1. #1

    Default volume replication

    hi

    I have some questions with regards to iscsi volume replication

    • i have 2 san with open-e v6.0up40.8101.4550 64bit
    • 1 san is in the office data centre and the other in the DR site
    • i have a 10mb WAN link between the 2 sites
    • i have setup various 10GB volumes and they each contain 1 vm (xenserver 5.6)


    i have setup replication jobs for each of the volumes and set the source and destination etc.

    my first question is the following:

    1) when setting up replication of the volumes of 10gb the initial replication replicates the full 50gb (according to the task status) although it contains no data. is this expected behaviour?

    2nd question has to do with additional replication after initial replication.

    • when i add the volume to my xenserver and replicate it replicates about 100mb of additional data
    • when i put a 8gb virtual machine on the volume (with 3.5gb of data and replicate it only replicates the 3.5gb (which is good)
    • i then stop volume replication, make the destination volume a source, mount it in xenserver and i can access the virtual machine, after making some slight tweaks i run the reverse replication job (on the primary i toggle source to destination) but to my surprise it replicates the full 10gb volume and not just the changes. I did select clear metadata and perhaps i should not have done that?
    • once the volume has replicated back to the office datacentre i change destination/source again on both san's and again the volume replication replicates the full 10gb volume.


    The above is a test setup but once i go live i will have 5x 100gb volumes and if replication is so inefficient over a 10mb link (even 100mb would be slow) the initial replication is going to take weeks and if i ever have to failover to the DR site and failback afterwards it is going to take weeks again to replicate the data back.

    the other thing is that during the initial replication the vm's become unusable, i found another thread on this forum about i/o write speed dropping and making vmalloc changes but as i have a 64 bit system this will not work

    sorry many questions but i would like figure out how i can make volume replication more efficient (only replicating changed bits rather than all of them) and how i can keep vm's up and running and working during the intial replication (and failback replication as well)

    many thanks for your help!

  2. #2
    Join Date
    May 2008
    Location
    Hamburg, Germany
    Posts
    108

    Default

    Hi butre,

    first of all: I have no practical experience with open-e DSS iSCSI replication.
    second: As far as I known DSS is using DRBD for replication.

    Quote Originally Posted by butre
    [...]
    1) when setting up replication of the volumes of 10gb the initial replication replicates the full 50gb (according to the task status) although it contains no data. is this expected behaviour?
    Yes, that's the behaviour I'd expect: the *device* is replicated/synchronized, not the data.

    It like with RAID syncs: when a device fails, all blocks need to be resynced, not only those containing data. DRBD doesn't know which blocks are empty, which are full - only the file system knows.

    Quote Originally Posted by butre
    2nd question has to do with additional replication after initial replication.

    • when i add the volume to my xenserver and replicate it replicates about 100mb of additional data
    • when i put a 8gb virtual machine on the volume (with 3.5gb of data and replicate it only replicates the 3.5gb (which is good)
    • [As thise are the blocks changed on the (replication) device, only those blocks are sent.

      Quote Originally Posted by butre
    • i then stop volume replication, make the destination volume a source, mount it in xenserver and i can access the virtual machine, after making some slight tweaks i run the reverse replication job (on the primary i toggle source to destination) but to my surprise it replicates the full 10gb volume and not just the changes.
    I'd expect to see this if DRBD has no log of the changes that occured during the down time.
    Quote Originally Posted by butre
    I did select clear metadata and perhaps i should not have done that?
  3. once the volume has replicated back to the office datacentre i change destination/source again on both san's and again the volume replication replicates the full 10gb volume.
Quote Originally Posted by butre

The above is a test setup but once i go live i will have 5x 100gb volumes and if replication is so inefficient over a 10mb link (even 100mb would be slow) the initial replication is going to take weeks and if i ever have to failover to the DR site and failback afterwards it is going to take weeks again to replicate the data back.
Maybe an initial replication with the systems connected via a local link may be an option, at least for the initial sync.

Quote Originally Posted by butre
the other thing is that during the initial replication the vm's become unusable, i found another thread on this forum about i/o write speed dropping and making vmalloc changes but as i have a 64 bit system this will not work
Maybe the write requests are delayed by synchronous requests across the WAN? Maybe the write queue is simply long and individual writes take their time because of the slow WAN link?

If running the initial sync locally, then you might test that scenario to see if it's more responsive as the WAN-added delays are avoided.

And if replaying the changes instead of full resyncs are after all possible, your time to resync would be much shorter. But of course VM performance will be much slower then.

Quote Originally Posted by butre
sorry many questions but i would like figure out how i can make volume replication more efficient (only replicating changed bits rather than all of them) and how i can keep vm's up and running and working during the intial replication (and failback replication as well)

many thanks for your help!
DRBD replication *is* incremental. It's just that for the initial sync, all blocks are handled as "changed" since nothing is known about them.

You may want to visit www.drbd.org for details of the replication mechanism.

BTW: Are your Xen servers going across the WAN, too, when the primary DSS fails? That'll definitely add to the i/o penalty when resyncing, as I expect the link to be saturated by the resync alone...

With regards,
jmo
Reply With Quote Reply With Quote

  • #3

    Default

    hi

    so yesterday i did another test with very different results on the initial synch

    • I setup 1 source and 1 destination volume of 10gb without initialising
    • i setup replication and this time it only synched about 10mb of data
    • i then added it to xenserver as an storage repository which resulted in another 80mb of data replication
    • when i put the 8gb virtual machine image on it it replicated 5gb of data
    • after it finished i reverted back and this time it only replicated about 150mb of data back (i added a file of 150mb once mounted in the DR site.


    the only difference is that i did not clear the metadata before each synch

    i will do another test today with a 2nd new volume and will clear the metadata and see if this makes any difference. when i did a similar test 2 weeks ago i saw the same results that is why the results i put in my original post baffled me as a full 10gb volume replication took place initially while it did not do this originally and again did not do it with yesterdays test.

    the xenservers do not migrate over the same WAN link in case of a failover as most critical servers use application level replication (exchange, bes, ms dfs for file servers) and all the data will have replicated beforehand.

    you are correct that the link is saturated with the initial replication but after that it should be fine and if the resynch works as i found yesterday (only changes) then a resynch should be very quick and not take up so much bandwidth/time.

    i forgot to add I use asynchronous method so it should not be waiting for the data to be written to disk.

    yesterday i copied a 600mb file to the volume while it was being replicated and noticed the write did not finish on the source volume till all the changes had been written to the destination volume. so i will definately not be doing volume synchs on volumes that have vm's up and running, will probably shut them down overnight and have the synch finish and then start them up again in the morning (they are not critical servers so that should be fine)

  • Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •