Visit Open-E website
Page 1 of 3 123 LastLast
Results 1 to 10 of 27

Thread: Kernel error kernel:[360783.353677]

  1. #1
    Join Date
    Oct 2009
    Posts
    53

    Default Kernel error kernel:[360783.353677]

    Hi

    Having upgraded earlier this week from V6, I just received the following kernel error:

    2012/09/27 20:45:21 kernel:[360783.353677] Pid: 0, comm: swapper Not tainted 2.6.35.14-oe64-00000-ga36704d #69
    2012/09/27 20:45:21 kernel:[360783.353679] Call Trace:
    2012/09/27 20:45:21 kernel:[360783.353680] <IRQ> [<ffffffff814d3455>] ? dev_watchdog+0x215/0x220
    2012/09/27 20:45:21 kernel:[360783.353684] [<ffffffff814d3455>] ? dev_watchdog+0x215/0x220
    2012/09/27 20:45:21 kernel:[360783.353688] [<ffffffff8104590c>] ? warn_slowpath_common+0x8c/0xc0
    2012/09/27 20:45:21 kernel:[360783.353691] [<ffffffff81045a16>] ? warn_slowpath_fmt+0x56/0x60
    2012/09/27 20:45:21 kernel:[360783.353693] [<ffffffff8103cdf1>] ? enqueue_task+0x61/0xa0
    2012/09/27 20:45:21 kernel:[360783.353695] [<ffffffff8103d873>] ? try_to_wake_up+0xb3/0x2b0
    2012/09/27 20:45:21 kernel:[360783.353699] [<ffffffff8129091e>] ? strlcpy+0x4e/0x80
    2012/09/27 20:45:21 kernel:[360783.353701] [<ffffffff814b9933>] ? netdev_drivername+0x43/0x50
    2012/09/27 20:45:21 kernel:[360783.353703] [<ffffffff814d3455>] ? dev_watchdog+0x215/0x220
    2012/09/27 20:45:21 kernel:[360783.353706] [<ffffffff81040d93>] ? __wake_up+0x43/0x70
    2012/09/27 20:45:21 kernel:[360783.353708] [<ffffffff814d3240>] ? dev_watchdog+0x0/0x220
    2012/09/27 20:45:21 kernel:[360783.353710] [<ffffffff8105122f>] ? run_timer_softirq+0x19f/0x200
    2012/09/27 20:45:21 kernel:[360783.353712] [<ffffffff810649c2>] ? ktime_get+0x52/0xd0
    2012/09/27 20:45:21 kernel:[360783.353715] [<ffffffff8104b5fd>] ? __do_softirq+0xad/0x150
    2012/09/27 20:45:21 kernel:[360783.353718] [<ffffffff819e2140>] ? early_idt_handler+0x0/0x71
    2012/09/27 20:45:21 kernel:[360783.353721] [<ffffffff8100ab5c>] ? call_softirq+0x1c/0x30
    2012/09/27 20:45:21 kernel:[360783.353723] [<ffffffff8100c505>] ? do_softirq+0x65/0xa0
    2012/09/27 20:45:21 kernel:[360783.353725] [<ffffffff8104b54c>] ? irq_exit+0x7c/0x80
    2012/09/27 20:45:21 kernel:[360783.353728] [<ffffffff8102289a>] ? smp_apic_timer_interrupt+0x6a/0xa0
    2012/09/27 20:45:21 kernel:[360783.353730] [<ffffffff8100a613>] ? apic_timer_interrupt+0x13/0x20
    2012/09/27 20:45:21 kernel:[360783.353731] <EOI> [<ffffffffa00a0d6a>] ? acpi_idle_enter_simple+0xf7/0x128 [processor]
    2012/09/27 20:45:21 kernel:[360783.353738] [<ffffffffa00a0d63>] ? acpi_idle_enter_simple+0xf0/0x128 [processor]
    2012/09/27 20:45:21 kernel:[360783.353741] [<ffffffff81474298>] ? cpuidle_idle_call+0x98/0x110
    2012/09/27 20:45:21 kernel:[360783.353744] [<ffffffff81008ba7>] ? cpu_idle+0x57/0x90
    2012/09/27 20:45:21 kernel:[360783.353746] [<ffffffff819e2e25>] ? start_kernel+0x2e5/0x3c0
    2012/09/27 20:45:21 kernel:[360783.353748] [<ffffffff819e23df>] ? x86_64_start_kernel+0xff/0x130

    I don't know to what extent I should be worried, because everything seems to work just fine, but it might be useful for me and others to know how we should respond to such errors.
    Last edited by Arcesilaus; 11-03-2012 at 02:00 PM.

  2. #2

    Default

    Could be an IRQ issue, try to disable any and all power management features from the BIOS of the motherboard. Also if you have additional NIC like a quad port or dual port nic maybe move that to a different slot.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  3. #3
    Join Date
    Oct 2009
    Posts
    53

    Default

    Ok, will try that at the next reboot and will keep this thread up to date.
    It just seemed to come completely out of the dark now, so I'll see if I can find out what unusual things happened around that time.

  4. #4
    Join Date
    Oct 2009
    Posts
    53

    Default

    Well, the issue might be a different one:

    Last week, I replaced all hardware (motherboard, processor, memory, nics) except the Areca Raid Controller and the DOM, after I had seen a apparently random few system freezes.
    So far so good for last week, but half an hour ago, yet again, the system froze completely.

    The RAID controller does not indicate any errors, so it might be the DOM (Transcend Industrial Flash 2GB SLC DOM with 1 million hours MTBF).
    To be honest, I do not know what role the DOM plays once the server has started, but I thought that is minimal and all vital information is stored on the system volume.
    Would it make sense to think it is unlikely the DOM is the problem here?

    Anyway, this time I would like to know a little more precise what causes the problems, so I downloaded the log files but there seems to be no useful info in the critical_errors.log.
    What other logs could reveal some useful hints what causes the freezes?
    Last edited by Arcesilaus; 10-01-2012 at 12:02 AM.

  5. #5

    Default

    Try re-formatting the DOM after you save your settings in Maint. > Misc and take all options so that you can save the file called settings.cnf then use to restore. You can look at the dmesg2.log as well to see if there are any errors and the packages ones during the time of the crash.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  6. #6
    Join Date
    Aug 2010
    Posts
    404

    Default

    Also try to "Disable" in BIOS all power management options.
    Try to disable APM and ACPI in CTRL+ALT+T menu(in console mode ).ctrl+alt+t->boot options(9)->Boot parameters(1).

  7. #7
    Join Date
    Oct 2009
    Posts
    53

    Default

    Ok, I turned off all power options in the BIOS.
    Indeed it seemed to be the DOM: today the system froze again, and now it won't even boot beyond the "loading" screen.
    So I unplugged the DOM and took an external USB drive and installed the software on that without any issue.

    Unfortunately, after finishing the install and rebooting, it hangs on "loading" again, so I tried booting the old DOM in v6, but to no avail...
    I had the RAID array checked during the night: zero errors.
    But still: no DSS version is loading succesfully now.

    So I took out all hardware and rebooted, et voila: it ran again.
    I plugged all hardware in, changed the slots of the NICs to avoid any IRQ errors and booted succesfully again.
    Apparently moving the NIC was helpful.

    After that, I cleared the TDB Database and reconnected to the DC and got it running.
    However, I faced once that the machine hang during booting (just after "init runlevel 2"), so I'll order a new DOM just to be sure...
    Last edited by Arcesilaus; 10-02-2012 at 11:43 AM. Reason: update

  8. #8
    Join Date
    Oct 2009
    Posts
    53

    Default

    Unfortunately, it is time for an update.

    Having succesfully updated to v7, I am still experiencing system freezes that drive me nuts.

    I looked in the dmesg.2 logfiles from the downloads I took after reboot, and in 2 out of 3 cases, no notification of any error was found.
    The last time, a reference is made to the onboard SATA controller:

    [ 663.417882] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
    [ 663.417885] ata1: irq_stat 0x00000040, connection status changed
    [ 663.417887] ata1: SError: { DevExch }
    [ 663.417893] ata1: hard resetting link
    [ 664.165187] ata1: SATA link down (SStatus 0 SControl 300)
    [ 664.185147] ata1: EH complete
    [ 899.498772] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
    [ 899.498776] ata1: irq_stat 0x00000040, connection status changed
    [ 899.498779] ata1: SError: { DevExch }
    [ 899.498787] ata1: hard resetting link

    However, no disk is connected to that port (only a single SSD on ata2), as can also be seen in earlier in the same logfile:

    5.547281] NET: Registered protocol family 5
    [ 5.638323] RPC: Registered rdma transport module.
    [ 5.907834] ata1: SATA link down (SStatus 0 SControl 300)
    [ 6.297160] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    [ 6.298030] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
    [ 6.298033] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
    [ 6.298035] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
    [ 6.298566] ata2.00: ATA-9: OCZ-VERTEX4, 1.5, max UDMA/133
    [ 6.298569] ata2.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
    [ 6.299425] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
    [ 6.299427] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
    [ 6.299429] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
    [ 6.299968] ata2.00: configured for UDMA/133

    After experiencing issues with another SSD (previously attached to ata1), I replaced almost all hardware and replaced the SSD with the one that is now attached to ata2.
    The SSD is used as an iSCSI target (file i/o) and hosts a Windows 2012 VM (vSphere) with a small SQL 2012 database.

    Strangely enough, I've ran this exact config for over 2 years without any problem, until 2 months ago.
    Is there anyone that has a clue?

  9. #9
    Join Date
    Oct 2010
    Location
    GA
    Posts
    935

    Default

    Did you do this:
    Try to disable APM and ACPI in CTRL+ALT+T menu(in console mode ).ctrl+alt+t->boot options(9)->Boot parameters(1).

  10. #10
    Join Date
    Oct 2009
    Posts
    53

    Default

    I did, and I also disabled all ACPI functions in the machine's BIOS.
    Furthermore, I've replaced the power supply yesterday and moved the SSD to the Areca Controller, to see if the onboard SATA controller might be the problem.

    By the way -just for the information of others-, after moving the disk from the onboard controller to the Areca controller, another reboot was required before I could reconnect the Volume to the iSCSI target, but it worked surprisingly well.

    Unfortunately, this morning the machine went down again, and not being home, I haven't had a chance to reset the machine and look at the logs.
    Tonight, I'll bring the machine back up, check the ACPI settings, examine the log and post my findings here.

    Are there any logs beside the dmesg.2 and critial_errors.log (so far: this was always empty) files that I should examine?

    P.s. I found two other forum threads in which similar behavior is mentioned: here and here. With reference to the first, recalling when the freezes started, I think it was after upgrading above v6 update 90.
    Unfortunately, in neither thread a final outcome is reported.
    Last edited by Arcesilaus; 10-25-2012 at 04:46 PM. Reason: related threads found

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •