Visit Open-E website
Page 1 of 3 123 LastLast
Results 1 to 10 of 22

Thread: Server hangs every few days

  1. #1

    Default Server hangs every few days

    Hello,

    Got a problem with a open -e dss 6 lite setup (latest version).

    Every few days the machine hangs, it does respond to pings, but no web admin. Also the console is not accessible.

    Hardware setup:

    Supermicro 16 disk chassis.
    Supermicro X8DTN+-F + Intel xeon 5620
    Areca 1880 controller
    6x 2TB WD Raid Edition disks in RAID 50 configuration


    I've checked the logs but apart from some warnings i cannot find the cause of the hang. Which logfiles should i check?

    Thanks

  2. #2

    Default

    Test with another USB flash stick with DSS on it, also check the critical error logs and the dmesg2 logs.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  3. #3

    Default

    DSS is currently running from a Intel 40GB SSD.

    In the logs the only errors i found are:

    Code:
    dev_vdisk: Registering virtual vdisk_blockio device S5iAcDpDBwN6nolN (WRITE_THROUGH, BLOCKIO)
    dev_vdisk: ***ERROR***: blkdev_issue_flush() failed: -95
    dev_vdisk: ***WARNING***: Device /dev/vg+backup1/lv+b+lvbackup101 doesn't support barriers, switching to NV_CACHE mode. Read README for more details.
    dev_vdisk: Attached SCSI target virtual disk S5iAcDpDBwN6nolN (file="/dev/vg+backup1/lv+b+lvbackup101", fs=768000MB, bs=512, nblocks=1572864000, cyln=768000)
    scst: Attached to virtual device S5iAcDpDBwN6nolN (id 1)
    and

    Code:
    iscsi-scst: ***WARNING***: CONFIG_TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION not enabled in your kernel. ISCSI-SCST will be working with not the best performance. Refer README file for details.
    scst: Target template iscsi registered successfully
    nothing else i checked pointed to an error.

  4. #4
    Join Date
    Oct 2010
    Location
    GA
    Posts
    935

    Default

    look into critical errors log, and dmesg logs as mentioned.
    If you can provide them to me, I'll take a look.

    also look at the iSCSI folder and show me the target settings.

  5. #5

    Default

    Hello,

    I have uploaded the logs here: logs. I couldn't find any errors in the logs, maybe you can take a look.

    Thanks

  6. #6

    Default

    Hey,

    open a ticket via http://www.open-e.com/service-and-support/ . Please attach logs downloaded via WebGUI.

    Ja-B

  7. #7
    Join Date
    Oct 2010
    Location
    GA
    Posts
    935

    Default

    Looking at what you have provided, the efollowing needs to be changed:
    you have:
    MaxBurstLength=1048576
    FirstBurstLength=262144

    change to:
    MaxBurstLength=16776192
    FirstBurstLength=65536

    Also you need to make sure to change each target, not just the first one.

    Overall these settings work welll:
    maxRecvDataSegmentLen=262144
    MaxBurstLength=16776192
    Maxxmitdatasegment=262144
    FirstBurstLength=65536
    DataDigest=None
    maxoutstandingr2t=8
    InitialR2T=No
    ImmediateData=Yes
    headerDigest=None
    Wthreads=8

    And I cant see the NIC settings in your upload, but you can also try jubo frames for the NICs: http://kb.open-e.com/Does-Open-E-sup...Frames_28.html

    Make sure initiators have matching settings, as this can cause the machine to seem locked/stalled.

    But complete logs would give a better picture as to wether or not there are other issues.
    I can only see a few files in yur link, not the whole package.

  8. #8

    Default

    Update: i have disabled some CPU power saving features in the bios, and it is running stable now.

  9. #9

    Default

    Unfortunately the box crashed again last night. But now i have some errors in the log:

    Code:
    	2011-04-12 01:59:10	scsi cmnd aborted, scsi_cmnd(0xffff88012c6d4700), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf8,0x 0,0x... (0/1) 	 	
    	2011-04-12 01:58:40	scsi cmnd aborted, scsi_cmnd(0xffff88012c6d4340), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf7,0x 0,0x... (1/1) 	 	
    	2011-04-12 01:58:10	scsi cmnd aborted, scsi_cmnd(0xffff88012c6d45c0), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf6,0x 0,0x... (1/1) 	 	
    	2011-04-12 01:57:40	scsi cmnd aborted, scsi_cmnd(0xffff88012c6d40c0), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf5,0x 0,0x... (0/1)
    This points to an hardware error. The raid controller has been replaced so it seems the backplane is not working properly.

  10. #10
    Join Date
    Oct 2010
    Location
    GA
    Posts
    935

    Default

    did you ever adjust for the values I mentioned above?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •