Hi butre,
first of all: I have no practical experience with open-e DSS iSCSI replication.
second: As far as I known DSS is using DRBD for replication.
Yes, that's the behaviour I'd expect: the *device* is replicated/synchronized, not the data.Originally Posted by butre
It like with RAID syncs: when a device fails, all blocks need to be resynced, not only those containing data. DRBD doesn't know which blocks are empty, which are full - only the file system knows.
Originally Posted by butre
- [As thise are the blocks changed on the (replication) device, only those blocks are sent.
I'd expect to see this if DRBD has no log of the changes that occured during the down time.Originally Posted by butre
Originally Posted by butre
Maybe an initial replication with the systems connected via a local link may be an option, at least for the initial sync.Originally Posted by butre
Maybe the write requests are delayed by synchronous requests across the WAN? Maybe the write queue is simply long and individual writes take their time because of the slow WAN link?Originally Posted by butre
If running the initial sync locally, then you might test that scenario to see if it's more responsive as the WAN-added delays are avoided.
And if replaying the changes instead of full resyncs are after all possible, your time to resync would be much shorter. But of course VM performance will be much slower then.
DRBD replication *is* incremental. It's just that for the initial sync, all blocks are handled as "changed" since nothing is known about them.Originally Posted by butre
You may want to visit www.drbd.org for details of the replication mechanism.
BTW: Are your Xen servers going across the WAN, too, when the primary DSS fails? That'll definitely add to the i/o penalty when resyncing, as I expect the link to be saturated by the resync alone...
With regards,
jmo








Reply With Quote