View Full Version : Server hangs every few days
welnet
03-28-2011, 03:13 PM
Hello,
Got a problem with a open -e dss 6 lite setup (latest version).
Every few days the machine hangs, it does respond to pings, but no web admin. Also the console is not accessible.
Hardware setup:
Supermicro 16 disk chassis.
Supermicro X8DTN+-F + Intel xeon 5620
Areca 1880 controller
6x 2TB WD Raid Edition disks in RAID 50 configuration
I've checked the logs but apart from some warnings i cannot find the cause of the hang. Which logfiles should i check?
Thanks
Test with another USB flash stick with DSS on it, also check the critical error logs and the dmesg2 logs.
welnet
03-28-2011, 03:30 PM
DSS is currently running from a Intel 40GB SSD.
In the logs the only errors i found are:
dev_vdisk: Registering virtual vdisk_blockio device S5iAcDpDBwN6nolN (WRITE_THROUGH, BLOCKIO)
dev_vdisk: ***ERROR***: blkdev_issue_flush() failed: -95
dev_vdisk: ***WARNING***: Device /dev/vg+backup1/lv+b+lvbackup101 doesn't support barriers, switching to NV_CACHE mode. Read README for more details.
dev_vdisk: Attached SCSI target virtual disk S5iAcDpDBwN6nolN (file="/dev/vg+backup1/lv+b+lvbackup101", fs=768000MB, bs=512, nblocks=1572864000, cyln=768000)
scst: Attached to virtual device S5iAcDpDBwN6nolN (id 1)
and
iscsi-scst: ***WARNING***: CONFIG_TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICAT ION not enabled in your kernel. ISCSI-SCST will be working with not the best performance. Refer README file for details.
scst: Target template iscsi registered successfully
nothing else i checked pointed to an error.
look into critical errors log, and dmesg logs as mentioned.
If you can provide them to me, I'll take a look.
also look at the iSCSI folder and show me the target settings.
welnet
03-31-2011, 10:43 AM
Hello,
I have uploaded the logs here: logs (http://www.welnet.nl/backup/logs.rar). I couldn't find any errors in the logs, maybe you can take a look.
Thanks
Hey,
open a ticket via http://www.open-e.com/service-and-support/ . Please attach logs downloaded via WebGUI.
Ja-B
Looking at what you have provided, the efollowing needs to be changed:
you have:
MaxBurstLength=1048576
FirstBurstLength=262144
change to:
MaxBurstLength=16776192
FirstBurstLength=65536
Also you need to make sure to change each target, not just the first one.
Overall these settings work welll:
maxRecvDataSegmentLen=262144
MaxBurstLength=16776192
Maxxmitdatasegment=262144
FirstBurstLength=65536
DataDigest=None
maxoutstandingr2t=8
InitialR2T=No
ImmediateData=Yes
headerDigest=None
Wthreads=8
And I cant see the NIC settings in your upload, but you can also try jubo frames for the NICs: http://kb.open-e.com/Does-Open-E-support-Jumbo-Frames_28.html
Make sure initiators have matching settings, as this can cause the machine to seem locked/stalled.
But complete logs would give a better picture as to wether or not there are other issues.
I can only see a few files in yur link, not the whole package.
welnet
04-11-2011, 12:46 PM
Update: i have disabled some CPU power saving features in the bios, and it is running stable now.
welnet
04-12-2011, 02:40 PM
Unfortunately the box crashed again last night. But now i have some errors in the log:
2011-04-12 01:59:10 scsi cmnd aborted, scsi_cmnd(0xffff88012c6d4700), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf8,0x 0,0x... (0/1)
2011-04-12 01:58:40 scsi cmnd aborted, scsi_cmnd(0xffff88012c6d4340), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf7,0x 0,0x... (1/1)
2011-04-12 01:58:10 scsi cmnd aborted, scsi_cmnd(0xffff88012c6d45c0), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf6,0x 0,0x... (1/1)
2011-04-12 01:57:40 scsi cmnd aborted, scsi_cmnd(0xffff88012c6d40c0), cmnd[0x8a,0x 0,0x 0,0x 0,0x 0,0x 1,0xb0,0x9e,0xf5,0x 0,0x... (0/1)
This points to an hardware error. The raid controller has been replaced so it seems the backplane is not working properly.
did you ever adjust for the values I mentioned above?
Please try the values that Gr-R mentioned to you, also did you checked for your RAID Controller health ?
welnet
04-13-2011, 11:14 AM
The Raid controller sometimes times out accessing drives in the enclosure. It's not the same drive everytime so it must be something with the backplane.
I've changed the ISCSI values mentioned on all initiators and targets this morning. So let's see how long it keeps running this time :)
Good, also try to remove all settings in your initiators, and reset them all again, even if you have to remove the initiators and reinstall them again.
EricD
04-13-2011, 05:00 PM
The Raid controller sometimes times out accessing drives in the enclosure. It's not the same drive everytime so it must be something with the backplane.
I've changed the ISCSI values mentioned on all initiators and targets this morning. So let's see how long it keeps running this time :)
Try updating to the latest Areca firmware I believe its 1.49. It may help with your drive timeouts.
Try updating to the latest Areca firmware I believe its 1.49. It may help with your drive timeouts.
I was going to suggest the same thing... backplane failure would be odd. not impossible, but odd.
welnet
04-18-2011, 01:42 PM
I've updated to the latest 1.49FW, unfortunately it shows the same error as as the previous controller which was returned to the dealer.
On boot-up it sometimes fails drives randomly (red flashing light next to drive) when detecting the drives and array. It requires a complete cold shutdown to reset the drive error. There is nothing wrong with the drives as far as i can see. Sometimes it also hangs indefinitely when detecting the controller card with the message 'please wait for the F/W to be ready...'.
Seems to point more and more to an incompatibility between the Areca 1880 drive and the 836-E1 SAS enclosure with LSI expander.
To be continued..
EricD
04-18-2011, 09:41 PM
What exact model and firmware level are your WD 2TB drives?
There have been a number of people that have had problems with WD 2TB having various timeout issues with several different Raid controllers.
Most have been cured with Bios updates to the Raid controllers but some have needed to update the firmware on the HD's themselves others have had to force the drives to Sata 150 mode to solve the issue.
I have had problems on occasion with the Areca BIOS locking up on Boot. If you are not booting from drives attached to the Areca you can disable the Areca BIOS.
Best Wishes :)
welnet
04-19-2011, 10:17 AM
The disks (WDC WD2003FYYS-02W0B0) have the firmware revision 01.01D01 installed.
I have searched for f/w upgrades for the disks but it seems there are none on the Western Digital site.
I could try downgrading the disks to SATA 150 mode, i'll try that later today.
welnet
04-19-2011, 02:38 PM
Is it possible to install another areca driver in the kernel?
This needs console access of course but enabling 'Remote Tech Support' gives an error 'check your internet connection'.
Is something like this possible?
DSS Lite does not have support for RAID controllers only the professional version of DSS V6 does.
welnet
04-19-2011, 03:06 PM
I know it is not 'official' but DSS lite does support hardware raid controllers. The kernel contains the drivers needed for the hardware, the only limitation is that the raid management tool is not available.
This is correct that you wont be able to work with the controllers with the DSS Lite version, from the link below in the section of "Single and Multiple Hardware RAID Controller Support" is not available for the DSS Lite version.
http://www.open-e.com/products/open-e-data-storage-software-v6-lite/comparison/
Powered by vBulletin® Version 4.1.9 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.