I had the drive die in my Primary DSS (DSS1) and everything switched to Secondary (DSS2)
To fix this I had to make DSS2 (now secondary) to be the primary and DSS1 (now primary) become the secondary. The reason being is the replication tasks were lost on DSS1 when it died. I was able to to fix this without taking my VM's off line. I use multipath ((MPIO) and originally had both paths pointing to 2 virtual IP's. On each xenserver I changed the second path to point to the actual real IP address so my MPIO was pointing to one virutal IP and one real IP on DSS2.
When I shut down the failover service on the Secondary DSS in prep to make it the Primary I lost the Virtual IP and my xenservers were complaining they lost one of two paths, but the server kept working because they were still connected to DSS2 and my VM's never went down. I then basically recreated the autofailover again using the Secondary DSS (DSS2) and making it the Primary. Had to recreate the tasks and info as if setting up failover for the first time. Had to clear the metadata on the source server (now DSS2) and started new replication tasks. Took about 3.5 hours to replicate 4.5TB of data (Using 10G point to point) All the time this was occuring my VM's never went down. They ran a little bit slower because all of the replication going on, but never stopped.
After the replication was completed and autofailover was started and working I reversed the process and changed my xenserver MPIO paths to both be virtual IP's. Be aware that I had to reboot each xenserver after I made the change with the MPIO path. I used live migrate to move the vm's off of each server before rebooting it. There may be a way to restart the multipath task on each xenserver without rebooting but I could not find a way to do it.
All in all having the autofailover and the MPIO in place really made a difference during this whole ordeal. No customers called to complain during any of this and that was the best part.