Visit Open-E website
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: RAID5 rebuild after disk error on DSS cluster

  1. #1
    Join Date
    Jul 2008
    Location
    austria, vienna
    Posts
    137

    Exclamation RAID5 rebuild after disk error on DSS cluster

    Hi!

    We have a defective HDD on our primary DSS node. What's recommended - first failover to secondary, then do the rebuild on primary, or do the rebuild on active primary??

    The system load to iSCSI target is not very high, we would do the rebuild at offpeak time (night). The estimated time to rebuild is 7 hours. It's a RAID5 with 8 SATA disks on Areca controller.

    Thanx!
    regards,
    Lukas

    descience.NET
    Dr. Lukas Pfeiffer
    A-1140 Wien
    Austria
    www.dotnethost.at

    DSS v6 b4550 iSCSI autofailover with Windows 2008 R2 failover cluster (having still some issues with autofailover).

    2 DSS: 3HE Supermicro X7DBE BIOS 2.1a, Areca ARC-1261ML FW 1.48, 8x WD RE3 1TB, 1x Intel PRO1000MT Dualport, 1x Intel PRO1000PT Dualport.

    2 Windows Nodes: Intel SR1500 + Intel SR1560, Dual XEON E54xx, 32 GB RAM, 6 NICs. Windows Server 2008 R2.

  2. #2
    Join Date
    Jul 2008
    Location
    austria, vienna
    Posts
    137

    Default

    Second question: If we rebuild the active primary node, should we stop the replication task? Because if the rebuild fails on primary node and data/raid is completely broken (desaster), is there a side effect on secondary node - broken data on sec. node? Or are data on sec. node absolutely save?
    regards,
    Lukas

    descience.NET
    Dr. Lukas Pfeiffer
    A-1140 Wien
    Austria
    www.dotnethost.at

    DSS v6 b4550 iSCSI autofailover with Windows 2008 R2 failover cluster (having still some issues with autofailover).

    2 DSS: 3HE Supermicro X7DBE BIOS 2.1a, Areca ARC-1261ML FW 1.48, 8x WD RE3 1TB, 1x Intel PRO1000MT Dualport, 1x Intel PRO1000PT Dualport.

    2 Windows Nodes: Intel SR1500 + Intel SR1560, Dual XEON E54xx, 32 GB RAM, 6 NICs. Windows Server 2008 R2.

  3. #3

    Default

    hi,

    from my experience you should start failover, never work with open-e storage thats rebuild. shutdown the primary and do the rebuild within the bios setup of your controller.
    after that start open-e on the primary and start resync. if there are any errors you must clear the metadata on the primary and to a full sync.

    i had very very bad experience with rebuild on open-e with adaptec and raid6...
    have a look at this thread
    http://forum.open-e.com/showthread.php?t=1991

    and this error still exist in open-e/adaptec! and should be fixed in one of the next releases...

    greetings
    roger

  4. #4
    Join Date
    Jul 2008
    Location
    austria, vienna
    Posts
    137

    Default

    thanx for quick reply! ok, I will do failover to sec. DSS first, than I run rebuild on primary via RAID controller console. We don't use Adaptec, we use Areca and I hope that there are no such issues.

    BTW: We had also a failed HDD on secondary DSS, we made rebuild without issues on running sec. DSS.

    We also use WD RE3 1TB HDDs like you - do you have bad experience with that HDDs? 2 failed HDDs in that short time is not good ...
    regards,
    Lukas

    descience.NET
    Dr. Lukas Pfeiffer
    A-1140 Wien
    Austria
    www.dotnethost.at

    DSS v6 b4550 iSCSI autofailover with Windows 2008 R2 failover cluster (having still some issues with autofailover).

    2 DSS: 3HE Supermicro X7DBE BIOS 2.1a, Areca ARC-1261ML FW 1.48, 8x WD RE3 1TB, 1x Intel PRO1000MT Dualport, 1x Intel PRO1000PT Dualport.

    2 Windows Nodes: Intel SR1500 + Intel SR1560, Dual XEON E54xx, 32 GB RAM, 6 NICs. Windows Server 2008 R2.

  5. #5
    Join Date
    Jul 2008
    Location
    austria, vienna
    Posts
    137

    Default

    RAID rebuild on prim.. node succeeded in about 4 hours .

    Has anyone else troubles with WD RE3 HDDs (WD1002FBYS - 01A6B0)?? Our defective HDDs are production date Sep 2008 and Feb 2009 ...
    regards,
    Lukas

    descience.NET
    Dr. Lukas Pfeiffer
    A-1140 Wien
    Austria
    www.dotnethost.at

    DSS v6 b4550 iSCSI autofailover with Windows 2008 R2 failover cluster (having still some issues with autofailover).

    2 DSS: 3HE Supermicro X7DBE BIOS 2.1a, Areca ARC-1261ML FW 1.48, 8x WD RE3 1TB, 1x Intel PRO1000MT Dualport, 1x Intel PRO1000PT Dualport.

    2 Windows Nodes: Intel SR1500 + Intel SR1560, Dual XEON E54xx, 32 GB RAM, 6 NICs. Windows Server 2008 R2.

  6. #6

    Default

    hi,
    sorry for my late reply.
    we had 2 defective drives nearly on the same day. the same as you have. first on the primary and second on the slave.
    we use 52 drives of these model and 2 defective drives in 36 months, i think thats ok.

    and grats to your successfull rebuild :-)

    roger

  7. #7
    Join Date
    Jul 2008
    Location
    austria, vienna
    Posts
    137

    Default

    I checked both defective drives with the WD Data LifeGuard Diagnostics Tool and both drive are healthy and error-free ...!? Why they failed in RAID? Should I however return the drives to WD (RMA)?
    regards,
    Lukas

    descience.NET
    Dr. Lukas Pfeiffer
    A-1140 Wien
    Austria
    www.dotnethost.at

    DSS v6 b4550 iSCSI autofailover with Windows 2008 R2 failover cluster (having still some issues with autofailover).

    2 DSS: 3HE Supermicro X7DBE BIOS 2.1a, Areca ARC-1261ML FW 1.48, 8x WD RE3 1TB, 1x Intel PRO1000MT Dualport, 1x Intel PRO1000PT Dualport.

    2 Windows Nodes: Intel SR1500 + Intel SR1560, Dual XEON E54xx, 32 GB RAM, 6 NICs. Windows Server 2008 R2.

  8. #8

    Default

    absolutely you should!
    we always send them back, there is alway a reason for failure in an raid system.

  9. #9
    Join Date
    Jul 2008
    Location
    austria, vienna
    Posts
    137

    Thumbs down

    today another HDD (WD1002FBYS - 01A6B0) crashed with timeout on sec. DSS ... I hate WD disks ...
    regards,
    Lukas

    descience.NET
    Dr. Lukas Pfeiffer
    A-1140 Wien
    Austria
    www.dotnethost.at

    DSS v6 b4550 iSCSI autofailover with Windows 2008 R2 failover cluster (having still some issues with autofailover).

    2 DSS: 3HE Supermicro X7DBE BIOS 2.1a, Areca ARC-1261ML FW 1.48, 8x WD RE3 1TB, 1x Intel PRO1000MT Dualport, 1x Intel PRO1000PT Dualport.

    2 Windows Nodes: Intel SR1500 + Intel SR1560, Dual XEON E54xx, 32 GB RAM, 6 NICs. Windows Server 2008 R2.

  10. #10

    Default

    Hi!

    We have several storages with about 100 WD 1TB SATA discs alltogether.
    We have already replaced about 6-7 within 2 years.

    Rebuild on a 16 disc storage with Areca 1680 controller and the raid configured as R6 is about 2-3 days.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •