Strange. Do you have any spare memory for that machine?
Strange. Do you have any spare memory for that machine?
I don't, but I could take out 1 module (4GB remaining), although I wonder: a full pass on memtest did not reveal any errors.
There are only 3 things remaining that I can think of:
- Now that I recall: I disabled the APM and ACPI boot options, but two (?) other were still enabled. Would it be worth a try to turn them off?
- Could it be a problem with the USB DOM that hosts the OS? I could try running on another USB device.
- Would it be worth a try to run a previous version that ran stable for a long time, i.e. v6 update 90? Are there any known changes that could cause this random freeze?
P.s. I just saw by the inability to reach my website that the machine froze again, this time after only a few hours - it's slowly getting hopeless...
Last edited by Arcesilaus; 10-26-2012 at 12:37 PM.
If we assume that the mainboard and the cpu is not faulty (which is very rare) and that all drivers are tested and your system is on the HCL of open-e there is only one thing left: the memory.
Please note that you need a minimum of 4 hours memtest to see if everything is okay.
Best would be if you test it much longer!
The motherboard and cpu have been replaced recently, and besides that, the system only contains a Transcend USB DOM, an Areca 1220 controller and two Intel NICs. As far as I know, these are all on the HCL and they've ran without any issues for over 2 years.
I will run a memtest during the night and see what that gives.
Well, the memtest ran for 7 hours, without any errors.
Nonetheless, I am getting more and more convinced that it is a hardware error, more than a software error.
Since the freeze happens mostly under a certain (known) load, I suspect the NICs.
I've changed the network configuration as a test, and will replace the NICs as soon as possible.
For the sake of shortening other users' problem cycles in the future, I'll keep this thread posted...
After a little short to a week, I'm back, unfortunately.
Last week, I changed the NIC configuration and brought the machine back up again.
It ran stable for a week, and I've seriously stressed the machine without issues.
Tonight, however, just when really nothing special was going on (machine was almost idle), I saw it happening again:
The SMB connections were cut off, followed by the iSCSI connections a few minutes later.
At first, a ping was still possible, but soon the machine froze completely, including the console.
So, again, I rebooted, downloaded the logs and went to see what happened. Here's what I found:
After finding the connection was lost, I tried to reconnect a share:[2012/11/02 21:47:01.820369, 1] smbd/service.c:1070(make_connection_snum)
192.168.47.11 (192.168.47.11) connect to service Music initially as user DOMAIN+USERNAME1 (uid=102, gid=107) (pid 32727)
[2012/11/02 21:47:13.160369, 1] smbd/service.c:1251(close_cnum)
192.168.47.11 (192.168.47.11) closed connection to service Music
[2012/11/02 21:57:01.840369, 0] lib/fault.c:46(fault_report)
================================================== =============
[2012/11/02 20:57:01.850369, 0] lib/fault.c:47(fault_report)
INTERNAL ERROR: Signal 7 in pid 32727 (3.5.4)
Please read the Trouble-Shooting section of the Samba3-HOWTO
[2012/11/02 20:57:01.850369, 0] lib/fault.c:49(fault_report)
From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2012/11/02 20:57:01.850369, 0] lib/fault.c:50(fault_report)
================================================== =============
[2012/11/02 20:57:01.850369, 0] lib/util.c:1465(smb_panic)
PANIC (pid 32727): internal error
[2012/11/02 20:57:01.850369, 0] lib/util.c:1569(log_stack_trace)
BACKTRACE: 0 stack frames:
[2012/11/02 20:57:01.850369, 0] lib/fault.c:326(dump_core)
dumping core in /usr/local/samba/var/cores/smbd
Not only looks the time registration in the first part a little weird, I cannot relate it to any process at the client with IP 192.168.47.11, any machine stress in general, nor any hardware issue.[2012/11/02 21:40:14.290369, 1] smbd/service.c:1070(make_connection_snum)
192.168.47.117 (192.168.47.117) connect to service Pictures initially as user DOMAIN+USERNAME2 (uid=0, gid=107) (pid 16420)
[2012/11/02 21:40:25.680369, 1] smbd/service.c:1251(close_cnum)
192.168.47.117 (192.168.47.117) closed connection to service Music
[2012/11/02 21:40:25.680369, 1] smbd/service.c:1251(close_cnum)
192.168.47.117 (192.168.47.117) closed connection to service Pictures
[2012/11/02 21:45:20.340369, 0] lib/fault.c:46(fault_report)
================================================== =============
[2012/11/02 21:45:20.340369, 0] lib/fault.c:47(fault_report)
INTERNAL ERROR: Signal 7 in pid 16420 (3.5.4)
Please read the Trouble-Shooting section of the Samba3-HOWTO
[2012/11/02 21:45:20.340369, 0] lib/fault.c:49(fault_report)
From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2012/11/02 21:45:20.340369, 0] lib/fault.c:50(fault_report)
================================================== =============
[2012/11/02 21:45:20.340369, 0] lib/util.c:1465(smb_panic)
PANIC (pid 16420): internal error
[2012/11/02 21:45:20.340369, 0] lib/util.c:1569(log_stack_trace)
BACKTRACE: 0 stack frames:
[2012/11/02 21:45:20.350369, 0] lib/fault.c:326(dump_core)
dumping core in /usr/local/samba/var/cores/smbd
[2012/11/02 21:45:20.730369, 1] smbd/service.c:1070(make_connection_snum)
192.168.47.117 (192.168.47.117) connect to service Movies initially as user DOMAIN+USERNAME2 (uid=0, gid=107) (pid 9889)
[2012/11/02 21:45:45.880369, 0] lib/fault.c:46(fault_report)
================================================== =============
Please help!
Last edited by Arcesilaus; 11-02-2012 at 10:47 PM.
There is one last thing I can think of: VMWare's iSCSI initiator causes the Open-E network stack to freeze the system.
Possible cause: http://vmtoday.com/2012/02/vsphere-5...oftware-iscsi/.
I found this problem to be the case in my vSphere machine. With hindsight, the problem occurred first with ESXi 5.0...