Visit Open-E website
Results 1 to 4 of 4

Thread: Drive on Primary SAN died. Have a question

  1. #1
    Join Date
    Feb 2009
    Posts
    142

    Default Drive on Primary SAN died. Have a question

    Ok, Have primary/secondary DSS V6's and the autofailover worked great after my Primary DSS had multiple drives die and basically lost my entire volume. So everything is running on Secondary DSS and now I need to rebuild my Primary. I'm a little afraid to reboot the Primary DSS bacause I'm not 100% sure I wont' lose my Virtual IP on Secondary DSS as I start rebuilding my Primary.

    1) Any danger of screwing anything up as I create new Drives, Volumes, Logical Volumes, and ISCSI targets on Primary DSS? This was a recent rebuild (this week) and I had not gotten around to saving any config files yet (I know, I know, should have had a backup right away)

    There really was not a problem with the Volume Group on the Primary, but because I was using SSD Caching (Cachecade) and had a SSD drive die. Because I made the stupid mistake of using Writeback even without a BBU for the SSD cache when it died it took out my Volume after i discarded the cache not realizing it would destroy my Primary DSS's volume. (currently have one 4.5TB Raid 10 volume)

    2) Can I rebuild from scratch a Autofailover partner (on Primary) without having a backup config? Will I be in any danger of losing the Virtual IP when attempting to fix this.

    I created a new volume and tried to add some LV's, but it told me it could not create them. I believe I have to go into C-A-X and then to option 14, 'Delete Content of Units' which will require a reboot. Biggest fear is reboot of DSS primary since it has all the autofailover info.

    Can I safely rebuild Primary with all the same info and fix this?

    Will call Support tomorrow but trying to do as much as possible today to get ahead of this.

    Thanks!
    Andy

  2. #2
    Join Date
    Feb 2009
    Posts
    142

    Default

    Working with Open-E support on this. Having to rebuild my primary DSS after losing the drive.

  3. #3
    Join Date
    Nov 2009
    Posts
    53

    Default

    @webguyz:
    could you please give me a little feedback if you were able to rebuild the cluster without shutting down the VIP(s)? (with a little help from the support)
    E.g. if there were any problems you've faced ..

    Thanks!

  4. #4
    Join Date
    Feb 2009
    Posts
    142

    Default

    I had the drive die in my Primary DSS (DSS1) and everything switched to Secondary (DSS2)

    To fix this I had to make DSS2 (now secondary) to be the primary and DSS1 (now primary) become the secondary. The reason being is the replication tasks were lost on DSS1 when it died. I was able to to fix this without taking my VM's off line. I use multipath ((MPIO) and originally had both paths pointing to 2 virtual IP's. On each xenserver I changed the second path to point to the actual real IP address so my MPIO was pointing to one virutal IP and one real IP on DSS2.

    When I shut down the failover service on the Secondary DSS in prep to make it the Primary I lost the Virtual IP and my xenservers were complaining they lost one of two paths, but the server kept working because they were still connected to DSS2 and my VM's never went down. I then basically recreated the autofailover again using the Secondary DSS (DSS2) and making it the Primary. Had to recreate the tasks and info as if setting up failover for the first time. Had to clear the metadata on the source server (now DSS2) and started new replication tasks. Took about 3.5 hours to replicate 4.5TB of data (Using 10G point to point) All the time this was occuring my VM's never went down. They ran a little bit slower because all of the replication going on, but never stopped.

    After the replication was completed and autofailover was started and working I reversed the process and changed my xenserver MPIO paths to both be virtual IP's. Be aware that I had to reboot each xenserver after I made the change with the MPIO path. I used live migrate to move the vm's off of each server before rebooting it. There may be a way to restart the multipath task on each xenserver without rebooting but I could not find a way to do it.

    All in all having the autofailover and the MPIO in place really made a difference during this whole ordeal. No customers called to complain during any of this and that was the best part.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •