Thread: Strange errors - Network issues?

    Strange errors - Network issues?

    The last month or so, I've been seeing strange errors in the event viewer. Here are the most recent ones:

    2009/12/15 14:36:27 Pid: 22567, comm: smbd Not tainted #61
    2009/12/15 14:36:27 [] xfs_free_ag_extent+0x2fe/0x670
    2009/12/15 14:36:27 [] xfs_free_extent+0xc7/0xf0
    2009/12/15 14:36:27 [] xfs_free_extent+0xc7/0xf0
    2009/12/15 14:36:27 [] xfs_inactive+0x3f2/0x460
    2009/12/15 14:36:27 [] xfs_fs_clear_inode+0x8e/0xd0
    2009/12/15 14:36:27 [] clear_inode+0xb3/0x140
    2009/12/15 14:36:27 [] generic_delete_inode+0xe2/0x120
    2009/12/15 14:36:27 [] iput+0x54/0x60
    2009/12/15 14:36:27 [] do_unlinkat+0xd9/0x130
    2009/12/15 14:36:27 [] syscall_call+0x7/0xb

    When this happens, I generally lose network access to the RAID and need to restart. I can still access shares of other RAIDs in the same system, so it's not hard locking... although it has a few times in the past with similar messages.
    Any ideas what could be causing these errors (and what do they mean)?

    This looks like some Call Trace error resulting from some hardware issue - I recommend to run the utility from the Extended Tools in Console - please run the memtest first!! VMware even recommends this as well so do this for at least 1 or 2 hrs before running the repair utility. Memtest is obtained after a restart of the system then after POST hit the ESCAPE or TAB key then you will get a basic menu then select memtest.
    Ok, looks like things are worse than I thought. I did a cold boot, and now my logical volume won't show up in the Open-E web GUI. The Areca card reports that the RAID is fine. I'm running a check on the volume set, but from the looks of it, it's going to take 12-24 hours to complete.

    I'm still not sure where the problem is originating though.
    Any help would be greatly appreciated.

    Thanks Todd, I didn't catch your first response until after writing my second.
    Just so I understand, the memtest is part of the Open-E software? Should I hit escape just before it starts the loading screen?

    Yes - POST means POWER ON SELF TEST so after the bios screen and before the NAS-R3 loads.

    This issue could be related to some hardware issue memory, disks, RAID, power hit, bad cache.... So what can happen is that the RAID may state the values are ok but our Logical Volume manager may see something different in the sector size or other that the controller is not reporting. I would be on this like hot cakes! I have seen drives go offline then places the controller in degraded mode then you better hope that you have a good hot spare (even that can bite u) or RAID 6.

    Anyway I would download the logs from the controller and check them w/ there engineers to be on the safe side.
    All our raids are RAID6 so at least we have that comfort. We also run everything on UPS to smooth out any power spikes/dips.

    We did have a dying drive that we replaced a week ago on that raid, although I was getting those Open-E error messages before that drive was reporting a problem.
    The logs in the Areca controller card don't report anything wrong either.

    Is it worthwhile to send you the Open-E logs? If so, please PM me your e-mail address.

    This brings up another question I've been wondering: Is there any preventative maintenance I can do to make sure my raids are as healthy as they can be?

    Thanks again for your help and advice!

    Send the logs to link below using the support form, make sure it is registered. Many use the disk scrubbing - topics have been discussed on the forum about this.

    Via Email, to receive technical support via email please complete the technical support form.
