Visit Open-E website
Results 1 to 8 of 8

Thread: IO Error with ARECA 1680 SATA volume ...

  1. #1
    Join Date
    Oct 2008
    Posts
    69

    Default IO Error with ARECA 1680 SATA volume ...

    Hy,

    May be somebody could help me...

    On my ARECA 1680 i have two volume : 10*400Go SAS RAID6 and 4*2To SATA RAID10.
    I have some random error on my SATA Volume (never on my SAS volume or at same time for SATA) :

    2013/05/10 09:02:03|[83999.590082] scsi cmnd aborted, scsi_cmnd(0xffff880008c24200), cmnd[0x28,0x 0,0x5a,0x7c,0x97,0xb8,0x 0,0x 1,0x 0,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/10 09:02:03|[83999.590216] scsi cmnd aborted, scsi_cmnd(0xffff88011cad8940), cmnd[0x28,0x 0,0x5a,0x7c,0x98,0xb8,0x 0,0x 1,0x 0,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/10 09:02:03|[83999.590344] scsi cmnd aborted, scsi_cmnd(0xffff88012525f580), cmnd[0x28,0x 0,0x4a,0x86,0x 9,0xa0,0x 0,0x 0,0x10,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/10 09:02:03|[83999.590478] arcmsr1: executing eh bus reset .....num_resets = 0, num_aborts = 3

    The ISCSI attached to SATA Volume goes offline and i have to reboot whole system to get back every thing...

    If i check Areca Status, no error in log, everything is ok...

    I disable NCQ for Sata without luck.

    I have latest firmware from ARECA.

    What can i do ?

    PS : my support expired 14 April 2013 on my both Open-E...

    Thanks

    Nsc

  2. #2
    Join Date
    Aug 2010
    Posts
    404

    Default

    These errors mostly related to the RAID controller, please be sure from the following:
    - You have the latest build of DSS.
    - You have the latest Firmware for your controller ( which you already have ).
    - Check the iSCSI settings/configurations and be sure that the initiator settings side is the same as the DSS settings.
    - Check your storage disks health, and run file system check on them ( as they are iSCSI, you may run any disk checking tool from your initiator OS side, ex: Chkdsk command in Windows OS).
    - Check your system memory health (RAM) by running Memtest tool ( reboot your system and you will find it as first option ).

  3. #3
    Join Date
    Oct 2008
    Posts
    69

    Default

    I just tried to install the latest DSS (6.0up98.xxxx.7337 2013-02-14 b7337), it's stuck a 72% ?
    I restart on old one (6.0up90.8101.5845) it boot perfectly...

    I have two OpenE (same hardware), my slave one updated to latest without any problem, the master failed...

    i have to move production VM to slave now ... Look my DSS system need some maintenance :/

  4. #4
    Join Date
    Oct 2008
    Posts
    69

    Default

    ok i got this message :

    Code:
    2013/05/13 13:38:05|[ 1685.663908] scsi cmnd aborted, scsi_cmnd(0xffff8800d4acb6c0), cmnd[0x28,0x 0,0x58,0xda,0x45,0xb0,0x 0,0x 0,0x40,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/13 13:38:05|[ 1685.663922] scsi cmnd aborted, scsi_cmnd(0xffff8800d4a0d700), cmnd[0x28,0x 0,0x68,0xa9,0x86,0x 0,0x 0,0x 1,0x 0,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/13 13:38:05|[ 1685.663926] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9d40), cmnd[0x2a,0x 0,0x6d,0xd3,0xd7,0xf0,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.663931] scsi cmnd aborted, scsi_cmnd(0xffff880127ab6840), cmnd[0x2a,0x 0,0x96,0xad,0xa7,0x28,0x 0,0x 0,0x 1,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.663935] scsi cmnd aborted, scsi_cmnd(0xffff880127ab60c0), cmnd[0x2a,0x 0,0x 1,0x 1,0xa7,0x28,0x 0,0x 0,0x 1,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/13 13:38:05|[ 1685.663940] scsi cmnd aborted, scsi_cmnd(0xffff8800d4ac05c0), cmnd[0x2a,0x 0,0x 0,0xa1,0xa7,0x28,0x 0,0x 0,0x 1,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.663944] scsi cmnd aborted, scsi_cmnd(0xffff8800d4a0d0c0), cmnd[0x2a,0x 0,0x4b,0xa5,0xa7,0x88,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.663948] scsi cmnd aborted, scsi_cmnd(0xffff8800d4ac0480), cmnd[0x2a,0x 0,0x 1,0x 1,0xa4,0x68,0x 0,0x 0,0x 1,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 1.
    2013/05/13 13:38:05|[ 1685.663953] scsi cmnd aborted, scsi_cmnd(0xffff8800d4ac0e80), cmnd[0x88,0x 0,0x 0,0x 0,0x 0,0x 1,0x e,0xc7,0x 6,0x80,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664081] scsi cmnd aborted, scsi_cmnd(0xffff880125654d00), cmnd[0x2a,0x 0,0x 0,0xa1,0xa6,0xf0,0x 0,0x 0,0x 1,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664086] scsi cmnd aborted, scsi_cmnd(0xffff8801256541c0), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0x58,0x75,0x 1,0x8a,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664090] scsi cmnd aborted, scsi_cmnd(0xffff880127ab6d40), cmnd[0x2a,0x 0,0x4b,0xa6,0x44,0x48,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664094] scsi cmnd aborted, scsi_cmnd(0xffff8800d4ac0340), cmnd[0x2a,0x 0,0x55,0x36,0x93,0xe8,0x 0,0x 0,0x70,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664099] scsi cmnd aborted, scsi_cmnd(0xffff8800d4acbd00), cmnd[0x2a,0x 0,0xdb,0xad,0x83,0x78,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664103] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67580), cmnd[0x2a,0x 0,0x74,0x4a,0xf9,0xd0,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664107] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9840), cmnd[0x2a,0x 0,0x67,0x61,0x 3,0xe8,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664112] scsi cmnd aborted, scsi_cmnd(0xffff8800d4a0d840), cmnd[0x2a,0x 0,0x 0,0x17,0x 8,0x18,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664116] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67e40), cmnd[0x2a,0x 0,0x 0,0x17,0x 8,0xe8,0x 0,0x 0,0x10,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664120] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67a80), cmnd[0x2a,0x 0,0x 0,0x17,0x a,0x90,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664124] scsi cmnd aborted, scsi_cmnd(0xffff8800bee671c0), cmnd[0x2a,0x 0,0x 0,0x18,0xc3,0x30,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664128] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67940), cmnd[0x2a,0x 0,0x 0,0x17,0x b,0x20,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664133] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67080), cmnd[0x2a,0x 0,0x 0,0x18,0xc3,0x38,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664137] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67800), cmnd[0x2a,0x 0,0x 0,0x18,0xc3,0x48,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664141] scsi cmnd aborted, scsi_cmnd(0xffff8800bee676c0), cmnd[0x2a,0x 0,0x 0,0x17,0x 8,0xd8,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664145] scsi cmnd aborted, scsi_cmnd(0xffff8800bee67d00), cmnd[0x2a,0x 0,0x 0,0x18,0xc3,0x50,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664150] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a90c0), cmnd[0x2a,0x 0,0x 0,0x18,0xc3,0x18,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664154] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9340), cmnd[0x2a,0x 0,0x 0,0x18,0xc3,0x28,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664158] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9c00), cmnd[0x2a,0x 0,0x 0,0x17,0x 8,0xe0,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664163] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9700), cmnd[0x2a,0x 0,0x 0,0x17,0x 9,0x60,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664167] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a95c0), cmnd[0x2a,0x 0,0x 0,0x17,0x 9,0x70,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664171] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9980), cmnd[0x2a,0x 0,0x 0,0x17,0x 9,0x80,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664176] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9200), cmnd[0x2a,0x 0,0x 0,0x17,0x 9,0x90,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664180] scsi cmnd aborted, scsi_cmnd(0xffff8800aa5a9ac0), cmnd[0x2a,0x 0,0x 0,0x17,0x 8,0x70,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], scsi_id = 0x 0, scsi_lun = 0x 0.
    2013/05/13 13:38:05|[ 1685.664195] arcmsr1: executing eh bus reset .....num_resets = 0, num_aborts = 33 
    2013/05/13 13:38:15|I/O Errors detected on unit S001. The unit requires your urgent attention in order to decrease the risk of data loss.
    2013/05/13 13:38:15|I/O Errors detected on unit S000. The unit requires your urgent attention in order to decrease the risk of data loss.
    2013/05/13 13:38:15|I/O errors detected on volume lv0003. All attached units to the volume group vg00 require your urgent attention in order to decrease the risk of data loss.
    it's really dangerous, because i lost data after the reboot. My VMs stored on the lv0003 but other too !
    After the message, VM running fine so we didnt reboot...

    Why you dont stop the iSCSI Target after theses messages ?

    I really think about leaving OpenE and Areca this time. Areca because a failed disk on SATA LUN1 kill my SAS LUN0 and OpenE because of not stopping when this error happen :/

  5. #5
    Join Date
    Aug 2010
    Posts
    404

    Default

    Dear nsc, the above mentioned errors are related to a RAID controller issue, it could be a very old Firmware or an issue with the hardware itself, and for sure if there is a hardware failure then the file system or the storage in general will get affected, so the issue is related to the controller itself not DSS or Open-E, beside Areca is one of the good companies for RAID controllers, and many people face an issue with a bad controller from different brands. You can contact Areca support team and they should help you or contact your hardware team for more assistance.

  6. #6
    Join Date
    Oct 2008
    Posts
    69

    Default

    I understand it's a problem with ARECA but definitly after this problem, i want that DSS stop all access of my volumes for iSCSI or NAS.

    When this problem occurs, DSS stop the volume on S001 so the VM stored on it just stop and no data were lost.
    Why it can do the same for all the units ?

    And for info i have latest firmware ARECA.
    And for OpenE when i tried to install latest V6, it just hang à 72%...

    Anyway i have to fix theses problems but i really consider this issue when looking for my new SAN / NAS / ESX 5 migration.

  7. #7
    Join Date
    Aug 2010
    Posts
    404

    Default

    We will pass your notes to our developers, and about installing the latest V6 and it hangs at 72%, that's maybe related to the issue with the LUNs you were facing, so we recommend you to do a file system check for your system, and check that the storage disks are fine too.

    You may try to install the latest V6 build on a USB stick and run the system ( as a test ) and you can apply the saved setup/configurations file to it, just to test and see if your system report any issue with the latest build (it should not), or if it's an issue related to your medium where the DSS is installed.

    You can Save your configuration and setup settings from the GUI in MAINTENANCE -> miscellaneous -> Function: Save settings.

  8. #8
    Join Date
    Oct 2008
    Posts
    69

    Default

    i came back to give you some info :

    - my SATA RAID10 volume is defective. It's always go "offline", i had to remove "2 disks" to finally get back data. I came back to RAID1 volume, we'll see.
    - my SATA Disk are Western Digital Caviar Black, looks like ARECA doesn't like them.
    - my USB DOM "DSS Offcial" was dead too, i had to reinstall on a SAS Disk.

    I just had a last one problem, when initializing the RAID1 Volume it's slowing very much the replication...

    I think ARECA is just unable to mix hard drive (SATA / SAS), i will use juste one volume for my next SAN...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •