Running DSS v7 in HA cluster mode on 2 nodes for years now. Some time ago one of the server nodes experience some hardware crash.
After restarting the failed, the cluster status now displays:
* node A: cluster running - degraded
* node B: cluster status - stopped
my 2 iscsi volumes are running as active on node A.
How can I safely fix my cluster?
Node B is outdated for several weeks now and I feel uncertain to fix the cluster.
It displayed the following prompt:
After starting it I got the node running in degraded mode and on the other node it's stopped.
Goal is to get my cluster back and keeping all data on node A (the degraded one). So that one should replicate its data back to node B.
Is following assumption correct?
- I need to cause some downtime by stopping the cluster completely
- trigger manual replication of the data volumes from node A (both volumes as source) to node B (both volumes as destination)
- then change the replication direction again so that my second volume has node B as source and node A as destination
- restart the cluster
It looks to me that when node B suddenly crashed, DSS lost its state of being source or destination upon the reboot.
- Not necessarily, you can enter your cluster into "Maintenance Mode", this will keep your VIP live and access available (CTRL ALT X from console, arrow down to "Cluster Maintenance Mode"
- Yes, you will want to make sure that your "good" data is not overwritten and confirm that it is set as the source.
- If you are operating in an Active/Active load balancing configuration, yes, you will want to swap source and destination for one resource so that it is primarily on node B. You may need to re-create your replication task here so that it reflects the change.
- You can restart one node at a time once your resources are replicating properly to avoid downtime.
Where possible, I always schedule a maintenance window in the event something goes awry. Also, it would be a good idea to confirm good clean backups before performing any of the above steps.