Visit Open-E website
Results 1 to 2 of 2

Thread: Cluster degraded, how to fix safely?

  1. #1

    Question Cluster degraded, how to fix safely?

    Hi,

    Running DSS v7 in HA cluster mode on 2 nodes for years now. Some time ago one of the server nodes experience some hardware crash.
    After restarting the failed, the cluster status now displays:
    * node A: cluster running - degraded
    * node B: cluster status - stopped

    my 2 iscsi volumes are running as active on node A.

    How can I safely fix my cluster?

    Node B is outdated for several weeks now and I feel uncertain to fix the cluster.
    It displayed the following prompt:



    After starting it I got the node running in degraded mode and on the other node it's stopped.

    Goal is to get my cluster back and keeping all data on node A (the degraded one). So that one should replicate its data back to node B.

    Is following assumption correct?
    - I need to cause some downtime by stopping the cluster completely
    - trigger manual replication of the data volumes from node A (both volumes as source) to node B (both volumes as destination)
    - then change the replication direction again so that my second volume has node B as source and node A as destination
    - restart the cluster

    It looks to me that when node B suddenly crashed, DSS lost its state of being source or destination upon the reboot.

    Regards,
    Nick

  2. #2

    Default

    To answer your questions :

    - Not necessarily, you can enter your cluster into "Maintenance Mode", this will keep your VIP live and access available (CTRL ALT X from console, arrow down to "Cluster Maintenance Mode"
    - Yes, you will want to make sure that your "good" data is not overwritten and confirm that it is set as the source.
    - If you are operating in an Active/Active load balancing configuration, yes, you will want to swap source and destination for one resource so that it is primarily on node B. You may need to re-create your replication task here so that it reflects the change.
    - You can restart one node at a time once your resources are replicating properly to avoid downtime.

    Where possible, I always schedule a maintenance window in the event something goes awry. Also, it would be a good idea to confirm good clean backups before performing any of the above steps.

    Hope this helps!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •