Visit Open-E website
Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Raid build keeps failing

  1. #1

    Default Raid build keeps failing

    I have been trying in vain to get the Open-E system running but I keep encountering problems with the RAID array.

    First I set up a RAID5 with the demo disk and everything seemed fine, so I ordered Open-E iSCSI-R3 Enterprise. Since I have not been able to keep an array up and running. I have about 10 disks spread over 2 controllers, the on board controller taking care of 3 and an Adaptec 3805 taking care of the other 6.

    I want a large array of RAID 6 or RAID 5 + 1 spare, I have tried both and after the building is complete or somewhere near the end the system goes batty, information isn't displayed properly and at least a few of the drives go into a failed state. I end up having to restart the system before things get working again.

    The problem is every time I have restarted suddenly any of the redundancy is gone. In the RAID 6 after a reboot 2 of the drives go missing, attempting to re-add them (which is horribly time consuming) results in typically a fail at the end of the re-build with the drives going back into a failed state and often all the drives on the Adaptec controller being unavailable until after a reboot. The same thing appears to be happening with the RAID 5 as well, after it gets all locked up after the re-build I reboot and the spare and the redundancy are unavailable. I get log errors saying I/O errors with the drives on the Adaptec controller but nothing more specific then that. I don't think this is actually a controller issue as it has the latest firmware and we have used it in an opensolaris ZFS prior to this as well as on other systems with a S/W RAID Set up.

    Any advice at this point would be helpful.

  2. #2

    Default

    Further I have yet to be able to add any of these to my esx servers. While the ESX server finds the target when it attempts to fdisk and format it appears to fail with

    Error during the configuration of the host: cannot open volume .vmfs/volumes/4cblablabla...

    I have made the File IO change (from block IO) that was mentioned in another thread and this at least allowed me to get through the adding a storage device wizard, but it still failed in the end...

  3. #3

    Default

    Are you able to create and add Logical volumes to the targets.
    Are you able to see your ISCSI targets with a MS initiator ?

  4. #4

    Default

    Alright a bit of an update

    I actually have things working this morning... first time in 3 days and about 8 builds of the array since I have got it to work. To elaborate I set a RAID 5 + spare to build over night. This morning the system was hung up... reboot... 2 of the drives were missing from the array, re-added them let it rebuild and magically it work for some reason this time. Still not convinced that it will stay like this but it is at least testable now.


    For the second part I think I found an entry in the ESX server that was causing the problem, this is working now.

    Torpedo Ray: Yes I was able to create volumes and targets before, the array just never got out of degraded mode

    MS initiator??? assuming you mean Microsoft? I try never to touch the stuff.

  5. #5
    Join Date
    Jan 2008
    Posts
    82

    Default

    ARe you sure you dont have any RAID or hardware issue? Drives been disappeared then show up.. it seems some HW.

    Update the drive firmware...

  6. #6
    Join Date
    Jan 2008
    Posts
    82

    Default

    I disagree with it seems HW....

    Update the HDD firmware.. or test it with new set of drives maybe you have bad patch...

  7. #7

    Default

    certainly not a hardware issue.
    Why am I so sure? I have been playing with this box for the last 3 weeks and have set up no less then 10 SW RAIDs through various OS's (opensolaris, ubuntu, gentoo, redhat). I have not seen this issue until I started using the Open-E software.

    To note the adaptec controller has the latest firmware already.

    The only issue I can see is that perhaps the 2 controllers are having a hard time talking to each other, the on board SATA controller and the Adaptec Controller. While this would likely be a software issue I've not read anything on this site or in the manuals to state that this should be a problem.

  8. #8

    Default

    To add insult to injury...

    This is the second time I have seen this now. I came in this morning and was unable to access the web interface for open-e, popped into the server room and saw that it was not able to access some of the log files, tried to initiate shutdown (ctrl-atl-k) and sys it can't find/execute shutdown... had to hard boot, its like the system drive has disappeared... .. any commands I tried to run says it couldn't find the files associated with.

    Once I restart I get no boot disk found, this is what happened yesterday as well and I had to physically remove the open-e key and put it in a different slot before it was found again. Thankfully the drives from the RAID didn't disappear and have to be re-added, but the array does have to resync since it was shut down not so gracefully.

  9. #9

    Default

    Just for an update for anyone who might want to know.

    So I ended up setting the system like this before it became more stable

    Moved 8 of the drives to the Adaptec SATA RAID card and set up a RAID5 thought HW

    set up a RAID1 with the 2 remaining cards on the motherboard

    grouped both the SW RAID1 and the HW RAID5 to one logical volume

    Since then it all appears to be working didn't even have to reboot anything when I came in this morning.

  10. #10
    Join Date
    Jan 2008
    Posts
    86

    Default

    G'day Barrday,
    We have just started getting an error that I wonder if it is related.
    The scenario is a we have a single Adaptec 5405 card in a Intel SSR212MC2, which has the onboard controller disabled, and 12x Seagate ES.2 1TB drives installed.

    The firmware and Bios on the card are 5.2.0 while DSS is 5.0.DB47000000.3178 and the "Driver" is reported as 1.1.5 (2455). Drives are Seagate ES.2 1TB with the "Raid" AN05 firmware.
    (But the current Linux driver on the Adaptec site is 1.1.5 (2459)? is this an issue?)

    We have had numerous errors and faults with the arrays on the controller and not able to make them "Optimal". The current configuration is all 12 in a Raid6 Array, when the array controller was told to "Verify and Fix" it got to 15% and then sent an alert:
    Event Description: Build/Verify failed: controller 1, logical device 1 ("array0") [0x00].
    When we investigate further the Agent log states that the drive in Slot 8 has been removed. Which is clearly not the case, after a "rescan" Slot 8's drive returns and it is now "rebuilding" it.
    So does this sound like the same issue you are having?
    I can't figure out why the controller thinks a drive has been "removed".... to me that is a drive failure, but so far we have lost Slot 11,10,9 and 8, so I doubt it as a true error.

    BTW, unrelated we also had the error when we tried to create an iSCSI VMFS, until we created a FS on it first, THEN went and created the VMFS, worked like a charm after that.

    So, anyone else using the Adaptec 5805 and finding issues?

    Rgds Ben.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •