volume replication

Printable View

I'd expect to see this if DRBD has no log of the changes that occured during the down time.
Quote:

Originally Posted by butre
I did select clear metadata and perhaps i should not have done that?
  • once the volume has replicated back to the office datacentre i change destination/source again on both san's and again the volume replication replicates the full 10gb volume.

  • Quote:

    Originally Posted by butre

    The above is a test setup but once i go live i will have 5x 100gb volumes and if replication is so inefficient over a 10mb link (even 100mb would be slow) the initial replication is going to take weeks and if i ever have to failover to the DR site and failback afterwards it is going to take weeks again to replicate the data back.

    Maybe an initial replication with the systems connected via a local link may be an option, at least for the initial sync.

    Quote:

    Originally Posted by butre
    the other thing is that during the initial replication the vm's become unusable, i found another thread on this forum about i/o write speed dropping and making vmalloc changes but as i have a 64 bit system this will not work

    Maybe the write requests are delayed by synchronous requests across the WAN? Maybe the write queue is simply long and individual writes take their time because of the slow WAN link?

    If running the initial sync locally, then you might test that scenario to see if it's more responsive as the WAN-added delays are avoided.

    And if replaying the changes instead of full resyncs are after all possible, your time to resync would be much shorter. But of course VM performance will be much slower then.

    Quote:

    Originally Posted by butre
    sorry many questions but i would like figure out how i can make volume replication more efficient (only replicating changed bits rather than all of them) and how i can keep vm's up and running and working during the intial replication (and failback replication as well)

    many thanks for your help!

    DRBD replication *is* incremental. It's just that for the initial sync, all blocks are handled as "changed" since nothing is known about them.

    You may want to visit www.drbd.org for details of the replication mechanism.

    BTW: Are your Xen servers going across the WAN, too, when the primary DSS fails? That'll definitely add to the i/o penalty when resyncing, as I expect the link to be saturated by the resync alone...

    With regards,
    jmo
  • 09-09-2010, 10:21 AM
    butre
    hi

    so yesterday i did another test with very different results on the initial synch

    • I setup 1 source and 1 destination volume of 10gb without initialising
    • i setup replication and this time it only synched about 10mb of data
    • i then added it to xenserver as an storage repository which resulted in another 80mb of data replication
    • when i put the 8gb virtual machine image on it it replicated 5gb of data
    • after it finished i reverted back and this time it only replicated about 150mb of data back (i added a file of 150mb once mounted in the DR site.


    the only difference is that i did not clear the metadata before each synch

    i will do another test today with a 2nd new volume and will clear the metadata and see if this makes any difference. when i did a similar test 2 weeks ago i saw the same results that is why the results i put in my original post baffled me as a full 10gb volume replication took place initially while it did not do this originally and again did not do it with yesterdays test.

    the xenservers do not migrate over the same WAN link in case of a failover as most critical servers use application level replication (exchange, bes, ms dfs for file servers) and all the data will have replicated beforehand.

    you are correct that the link is saturated with the initial replication but after that it should be fine and if the resynch works as i found yesterday (only changes) then a resynch should be very quick and not take up so much bandwidth/time.

    i forgot to add I use asynchronous method so it should not be waiting for the data to be written to disk.

    yesterday i copied a 600mb file to the volume while it was being replicated and noticed the write did not finish on the source volume till all the changes had been written to the destination volume. so i will definately not be doing volume synchs on volumes that have vm's up and running, will probably shut them down overnight and have the synch finish and then start them up again in the morning (they are not critical servers so that should be fine)