I keep getting replication errors mailed to me from my replication target SAN. It is located at a different branch via a 100MB link. The sync is asyncronous, and runs from 7PM-7AM the next morning. Here are the errors the SAN is e-mailing me in chronological order:
First e-mail:
2009/07/21 19:00:37 Replication:Volume Replication: Device lv0000: Connection (from WFConnection to WFBitMapT). Mode local/remote (from Secondary/Unknown to Secondary/Primary). DataStatus local/remote (from Consistent/DUnknown to Consistent/Consistent).
Second e-mail:
2009/07/21 19:00:37 Replication:Volume Replication: Device lv0000: Connection (from WFConnection to WFBitMapT). Mode local/remote (from Secondary/Unknown to Secondary/Primary). DataStatus local/remote (from Consistent/DUnknown to Consistent/Consistent).
2009/07/21 19:01:00 Replication:Volume Replication: Device lv0000: Connection (from WFBitMapT to SyncTarget). DataStatus local/remote (from Consistent/Consistent to Inconsistent/Consistent).
Third e-mail:
2009/07/22 07:00:36 Replication:Volume Replication: Device lv0000: Connection (from SyncTarget to WFConnection). Mode local/remote (from Secondary/Primary to Secondary/Unknown). DataStatus local/remote (from Inconsistent/Consistent to Consistent/DUnknown).
Fourth e-mail:
2009/07/22 07:00:36 Replication:Volume Replication: Device lv0000: Connection (from SyncTarget to WFConnection). Mode local/remote (from Secondary/Primary to Secondary/Unknown). DataStatus local/remote (from Inconsistent/Consistent to Consistent/DUnknown).
Now according to the task log on the source SAN the volumes were Consistent/Consistent at the end of the replication. But that last e-mail said unknown. Any ideas?
I would check the test.log in the NIC section of ifconfig -a to see if there are any packet errors that might be generating those messages from consistent, inconsistent to unknown (Destination) to consistent.
Well I just checked that. Here is a copy of the log from the source servers interface that connects to the second SAN. All the identifying info has been removed of course.
The only thing I can think of other then the NIC's is the RAID health that can cause this or there is more data that needs additional time other then the time frame that you have but in your case this does not apply as it looks like it was consistent.
I get the same messages every day.
I have 4 async replication tasks and get 4 emails at start time and 4 at end time.
The task in the GUI shows consistent for every volume.
I thought, the messages are produced because the volume gets inconsistent/unknown as soon as the task stops?!
I am getting 8 of these each day. That's why I posted a feature request a while ago, since planned interruptions of a volume replication isn't really an error, but necessary: