Kernel error kernel:[360783.353677]

**danielweeber** · 10-26-2012, 12:23 PM

Strange. Do you have any spare memory for that machine?

**Arcesilaus** · 10-26-2012, 12:31 PM

I don't, but I could take out 1 module (4GB remaining), although I wonder: a full pass on memtest did not reveal any errors.

There are only 3 things remaining that I can think of:

Now that I recall: I disabled the APM and ACPI boot options, but two (?) other were still enabled. Would it be worth a try to turn them off?
Could it be a problem with the USB DOM that hosts the OS? I could try running on another USB device.
Would it be worth a try to run a previous version that ran stable for a long time, i.e. v6 update 90? Are there any known changes that could cause this random freeze?

P.s. I just saw by the inability to reach my website that the machine froze again, this time after only a few hours - it's slowly getting hopeless...

**danielweeber** · 10-26-2012, 12:37 PM

If we assume that the mainboard and the cpu is not faulty (which is very rare) and that all drivers are tested and your system is on the HCL of open-e there is only one thing left: the memory.
Please note that you need a minimum of 4 hours memtest to see if everything is okay.
Best would be if you test it much longer!

**Arcesilaus** · 10-26-2012, 12:50 PM

The motherboard and cpu have been replaced recently, and besides that, the system only contains a Transcend USB DOM, an Areca 1220 controller and two Intel NICs. As far as I know, these are all on the HCL and they've ran without any issues for over 2 years.

I will run a memtest during the night and see what that gives.

**Arcesilaus** · 10-29-2012, 11:31 AM

Well, the memtest ran for 7 hours, without any errors.
Nonetheless, I am getting more and more convinced that it is a hardware error, more than a software error.
Since the freeze happens mostly under a certain (known) load, I suspect the NICs.
I've changed the network configuration as a test, and will replace the NICs as soon as possible.

For the sake of shortening other users' problem cycles in the future, I'll keep this thread posted...

**Arcesilaus** · 11-02-2012, 10:17 PM

After a little short to a week, I'm back, unfortunately.
Last week, I changed the NIC configuration and brought the machine back up again.

It ran stable for a week, and I've seriously stressed the machine without issues.
Tonight, however, just when really nothing special was going on (machine was almost idle), I saw it happening again:
The SMB connections were cut off, followed by the iSCSI connections a few minutes later.
At first, a ping was still possible, but soon the machine froze completely, including the console.

So, again, I rebooted, downloaded the logs and went to see what happened. Here's what I found:

[2012/11/02 21:47:01.820369, 1] smbd/service.c:1070(make_connection_snum)
192.168.47.11 (192.168.47.11) connect to service Music initially as user DOMAIN+USERNAME1 (uid=102, gid=107) (pid 32727)
[2012/11/02 21:47:13.160369, 1] smbd/service.c:1251(close_cnum)
192.168.47.11 (192.168.47.11) closed connection to service Music
[2012/11/02 21:57:01.840369, 0] lib/fault.c:46(fault_report)
================================================== =============
[2012/11/02 20:57:01.850369, 0] lib/fault.c:47(fault_report)
INTERNAL ERROR: Signal 7 in pid 32727 (3.5.4)
Please read the Trouble-Shooting section of the Samba3-HOWTO
[2012/11/02 20:57:01.850369, 0] lib/fault.c:49(fault_report)

From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2012/11/02 20:57:01.850369, 0] lib/fault.c:50(fault_report)
================================================== =============
[2012/11/02 20:57:01.850369, 0] lib/util.c:1465(smb_panic)
PANIC (pid 32727): internal error
[2012/11/02 20:57:01.850369, 0] lib/util.c:1569(log_stack_trace)
BACKTRACE: 0 stack frames:
[2012/11/02 20:57:01.850369, 0] lib/fault.c:326(dump_core)
dumping core in /usr/local/samba/var/cores/smbd

After finding the connection was lost, I tried to reconnect a share:

[2012/11/02 21:40:14.290369, 1] smbd/service.c:1070(make_connection_snum)
192.168.47.117 (192.168.47.117) connect to service Pictures initially as user DOMAIN+USERNAME2 (uid=0, gid=107) (pid 16420)
[2012/11/02 21:40:25.680369, 1] smbd/service.c:1251(close_cnum)
192.168.47.117 (192.168.47.117) closed connection to service Music
[2012/11/02 21:40:25.680369, 1] smbd/service.c:1251(close_cnum)
192.168.47.117 (192.168.47.117) closed connection to service Pictures
[2012/11/02 21:45:20.340369, 0] lib/fault.c:46(fault_report)
================================================== =============
[2012/11/02 21:45:20.340369, 0] lib/fault.c:47(fault_report)
INTERNAL ERROR: Signal 7 in pid 16420 (3.5.4)
Please read the Trouble-Shooting section of the Samba3-HOWTO
[2012/11/02 21:45:20.340369, 0] lib/fault.c:49(fault_report)

From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2012/11/02 21:45:20.340369, 0] lib/fault.c:50(fault_report)
================================================== =============
[2012/11/02 21:45:20.340369, 0] lib/util.c:1465(smb_panic)
PANIC (pid 16420): internal error
[2012/11/02 21:45:20.340369, 0] lib/util.c:1569(log_stack_trace)
BACKTRACE: 0 stack frames:
[2012/11/02 21:45:20.350369, 0] lib/fault.c:326(dump_core)
dumping core in /usr/local/samba/var/cores/smbd
[2012/11/02 21:45:20.730369, 1] smbd/service.c:1070(make_connection_snum)
192.168.47.117 (192.168.47.117) connect to service Movies initially as user DOMAIN+USERNAME2 (uid=0, gid=107) (pid 9889)
[2012/11/02 21:45:45.880369, 0] lib/fault.c:46(fault_report)
================================================== =============

Not only looks the time registration in the first part a little weird, I cannot relate it to any process at the client with IP 192.168.47.11, any machine stress in general, nor any hardware issue.
Please help!

**Arcesilaus** · 11-20-2012, 10:00 PM

There is one last thing I can think of: VMWare's iSCSI initiator causes the Open-E network stack to freeze the system.
Possible cause: http://vmtoday.com/2012/02/vsphere-5...oftware-iscsi/.

I found this problem to be the case in my vSphere machine. With hindsight, the problem occurred first with ESXi 5.0...

Thread: Kernel error kernel:[360783.353677]

Thread Tools

Display

Tags for this Thread

Posting Permissions