Hi, I am running two open-e DSS 6 with synchronous replication between the units. when the primary had a hardware failure, I put the primarys(1) IP address on the backup(2) unit. I changed the replication volumes to source on 1 and removed CHAP from all volumes (because it was causing problems) got all clients running again pointing at unit 1. unit 2 has now been repaired and I put the ip addresses from unit 1 on unit 2, (so they have swapped ip addresses) I now cannot get replication to run. on unit 2 (now the backup) I have set the replication volumes to destination, and cleared metadata on both units. I tried deleteing the entire destination LV and recreated it, set it up to use replication and attached it to the same iscsi target. I was able to create the replication job connecting the LV0004 in a recreated replication task, but it will not run.
what is slightly unusual about my implementation is that the servers have 2 nics, they were originally bonded, but later broken to see if we could set up automatic failover.
unit1 eth0 10.1.0.3, eth1 10.1.1.104
unit2 WAS eth0 10.1.0.4, eth1 10.1.1.103
I found that when i try to create replication tasks unit 1 cant see the destinations on unit 2 unless I set the mirroring to use the eth0 address, which puts all my traffic on eth0 and none on eth1.
I have created a new bonded nic on unit 2, bond0 10.1.0.4
and unit 1 can still see the destination volumes on unit 2, but replication will not run.
I tried creating a static route to make eth1 use eth1 when communicating with the eth1 on the opposite unit.
both units can ping all addresses.
I thought it would work if I can ping between units, have source and destination, have cleared metadata, have deleted and recreated LV of same size and replication type (block), have the same chap settings on the iscsi target (none) .... so I am at a loss here. any suggestions welcomed. Thanks.