volume replication

hi

I have some questions with regards to iscsi volume replication

i have 2 san with open-e v6.0up40.8101.4550 64bit
1 san is in the office data centre and the other in the DR site
i have a 10mb WAN link between the 2 sites
i have setup various 10GB volumes and they each contain 1 vm (xenserver 5.6)

i have setup replication jobs for each of the volumes and set the source and destination etc.

my first question is the following:

1) when setting up replication of the volumes of 10gb the initial replication replicates the full 50gb (according to the task status) although it contains no data. is this expected behaviour?

2nd question has to do with additional replication after initial replication.

when i add the volume to my xenserver and replicate it replicates about 100mb of additional data
when i put a 8gb virtual machine on the volume (with 3.5gb of data and replicate it only replicates the 3.5gb (which is good)
i then stop volume replication, make the destination volume a source, mount it in xenserver and i can access the virtual machine, after making some slight tweaks i run the reverse replication job (on the primary i toggle source to destination) but to my surprise it replicates the full 10gb volume and not just the changes. I did select clear metadata and perhaps i should not have done that?
once the volume has replicated back to the office datacentre i change destination/source again on both san's and again the volume replication replicates the full 10gb volume.

The above is a test setup but once i go live i will have 5x 100gb volumes and if replication is so inefficient over a 10mb link (even 100mb would be slow) the initial replication is going to take weeks and if i ever have to failover to the DR site and failback afterwards it is going to take weeks again to replicate the data back.

the other thing is that during the initial replication the vm's become unusable, i found another thread on this forum about i/o write speed dropping and making vmalloc changes but as i have a 64 bit system this will not work

sorry many questions but i would like figure out how i can make volume replication more efficient (only replicating changed bits rather than all of them) and how i can keep vm's up and running and working during the intial replication (and failback replication as well)

many thanks for your help!

Hi butre,

first of all: I have no practical experience with open-e DSS iSCSI replication.
second: As far as I known DSS is using DRBD for replication.

Quote:

Originally Posted by butre

[...]
1) when setting up replication of the volumes of 10gb the initial replication replicates the full 50gb (according to the task status) although it contains no data. is this expected behaviour?

Yes, that's the behaviour I'd expect: the *device* is replicated/synchronized, not the data.

It like with RAID syncs: when a device fails, all blocks need to be resynced, not only those containing data. DRBD doesn't know which blocks are empty, which are full - only the file system knows.

Quote:

Originally Posted by butre

2nd question has to do with additional replication after initial replication.

when i add the volume to my xenserver and replicate it replicates about 100mb of additional data
when i put a 8gb virtual machine on the volume (with 3.5gb of data and replicate it only replicates the 3.5gb (which is good)

[As thise are the blocks changed on the (replication) device, only those blocks are sent.

Quote:

Originally Posted by butre

i then stop volume replication, make the destination volume a source, mount it in xenserver and i can access the virtual machine, after making some slight tweaks i run the reverse replication job (on the primary i toggle source to destination) but to my surprise it replicates the full 10gb volume and not just the changes.

hi

so yesterday i did another test with very different results on the initial synch

I setup 1 source and 1 destination volume of 10gb without initialising
i setup replication and this time it only synched about 10mb of data
i then added it to xenserver as an storage repository which resulted in another 80mb of data replication
when i put the 8gb virtual machine image on it it replicated 5gb of data
after it finished i reverted back and this time it only replicated about 150mb of data back (i added a file of 150mb once mounted in the DR site.

the only difference is that i did not clear the metadata before each synch

i will do another test today with a 2nd new volume and will clear the metadata and see if this makes any difference. when i did a similar test 2 weeks ago i saw the same results that is why the results i put in my original post baffled me as a full 10gb volume replication took place initially while it did not do this originally and again did not do it with yesterdays test.

the xenservers do not migrate over the same WAN link in case of a failover as most critical servers use application level replication (exchange, bes, ms dfs for file servers) and all the data will have replicated beforehand.

you are correct that the link is saturated with the initial replication but after that it should be fine and if the resynch works as i found yesterday (only changes) then a resynch should be very quick and not take up so much bandwidth/time.

i forgot to add I use asynchronous method so it should not be waiting for the data to be written to disk.

yesterday i copied a 600mb file to the volume while it was being replicated and noticed the write did not finish on the source volume till all the changes had been written to the destination volume. so i will definately not be doing volume synchs on volumes that have vm's up and running, will probably shut them down overnight and have the synch finish and then start them up again in the morning (they are not critical servers so that should be fine)