Visit Open-E website
Results 1 to 2 of 2

Thread: Frozen System with RAID 6, WB, ESXi 4

  1. #1

    Default Frozen System with RAID 6, WB, ESXi 4

    Hi,

    I am in the midst of trying to figure out why my storage system (DSS V6 Lite running on SuperMicro PDSME motherboard, Pentium D 3.0 Ghz CPU, 3GB ECC RAM, 8 x 320GB SATA connected by SATA JBOD controller) exhibits the disturbing behavior of suddenly and without warning freezing up, when used in a specific configuration.

    The configuration is this: I set up the eight drives as software RAID 6 (MD0), then create a volume group consisting of MD0, then create an iSCSI volume and target with write-back enabled (WB), create a VMFS volume from ESXi 4.0, and attempt to copy a large amount of data (about 400GB) onto the VMFS volume. It seems like the freeze-up happens randomly most of the way through the copy, and there is an increased chance of freezing up if I am in the datastore browser, and hit the refresh button repeatedly while the copy is going on.

    The freeze-up is fatal. The storage system stops responding to PING, and all drive and network activities come to a total stop. The only way to recover is by restarting the system. Since WB is turned on, VMFS is corrupted in a way that it cannot be repaired.

    I understand that writeback cache is not generally recommended because of data safety reasons, but it seems like if I need to copy a large amount of data onto the RAID 6, the performance boost is worth it (~12MB/sec vs. ~65MB/sec), and I can always turn the WB off after the copy is done, and I am not risking much since I can always re-copy if copying gets interrupted. However, because I am getting this disturbing seizing-up behavior which I did not expect, I have to try to track down why this is happening, becaues turning off WB at this point may merely mask the problem only to have it show up later when the system is heavily being relied on in a production environment.

    I have already ruled out the specific 8-port SATA JBOD controller card as the culprit, because I replaced it and had the same problem occur again. I am going to try to reproduce the problem with a 4-drive RAID 6 using only the motherboard SATA ports; then try reproducing the problem on a completely different system.

    Any suggestions or reports of similar experiences that can help speed up the troubleshooting process would be very helpful.

  2. #2

    Default

    I'd bootup your server with a linux boot disk and perform a memory test.. Also I would try using a different network adapter - I've seen both of those cause this same issue. The network adapter one through me for a loop once because of course the first things you test are Memory, HD's, etc.. So of course it was the last thing I checked - network adapter.

    good luck!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •