Visit Open-E website
Results 1 to 10 of 19

Thread: Error messages and system hangs

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1

    Default Error messages and system hangs

    Hi

    We habe to DSS v6 Machines running with iSCSI Failover

    Every few days we get some error messages:

    Code:
    2009/09/08 03:23:01	kernel:CPU 1 
    
    2009/09/08 03:23:01	kernel:Pid: 1175, comm: bonding Tainted: G      D   2.6.27.10-oe64-00000-g9b2116f #12
    
    2009/09/08 03:23:01	kernel:RIP: 0010:[<ffffffff802961b8>]  [<ffffffff802961b8>] filp_close+0x18/0xa0
    
    2009/09/08 03:23:01	kernel:RSP: 0000:ffff8800ad9d9f48  EFLAGS: 00010246
    
    2009/09/08 03:23:01	kernel:RAX: fffffffffffffff7 RBX: 0008ab5abe171500 RCX: 0000000000000003
    
    2009/09/08 03:23:01	kernel:RDX: 0000000000000000 RSI: ffff88013e73a9c0 RDI: 0008ab5abe171500
    
    2009/09/08 03:23:01	kernel:RBP: 0000000000000003 R08: 0000000000000003 R09: 00000000ffdfead8
    
    2009/09/08 03:23:01	kernel:R10: ffff8800ad9d8000 R11: 0000000000000000 R12: 0000000000000000
    
    2009/09/08 03:23:01	kernel:R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    
    2009/09/08 03:23:01	kernel:FS:  0000000000000000(0000) GS:ffff88013f871dc0(0063) knlGS:00000000f7d666c0
    
    2009/09/08 03:23:01	kernel:CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    
    2009/09/08 03:23:01	kernel:CR2: 0000000009f66268 CR3: 000000007fb73000 CR4: 00000000000006a0
    
    2009/09/08 03:23:01	kernel:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    
    2009/09/08 03:23:01	kernel:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    
    2009/09/08 03:23:01	kernel:Process bonding (pid: 1175, threadinfo ffff8800ad9d8000, task ffff88013b798b50)
    
    2009/09/08 03:23:01	kernel:Stack:  ffff88013e73a9c0 0000000000000003 0008ab5abe171500 ffffffff802962cc
    
    2009/09/08 03:23:01	kernel:0000000000000003 00000000ffdfead8 0000000000000000 ffffffff80228282
    
    2009/09/08 03:23:01	kernel:0000000000000000 0000000000000000 0000000000000000 0000000000000000
    
    2009/09/08 03:23:01	kernel:Call Trace:
    
    2009/09/08 03:23:01	kernel:[<ffffffff802962cc>] ? sys_close+0x8c/0xf0
    
    2009/09/08 03:23:01	kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa
    
    2009/09/08 03:23:01	kernel:
    
    2009/09/08 03:23:01	kernel:
    
    2009/09/08 03:23:01	kernel:Code: ff 66 90 89 f2 be 41 02 00 00 e9 f4 fd ff ff 66 66 66 90 48 83 ec 18 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 45 31 e4 <48> 83 7f 28 00 48 89 f5 74 50 48 8b 47 20 48 85 c0 75 35 48 89 
    
    2009/09/08 03:23:01	kernel:RSP <ffff8800ad9d9f48>
    
    2009/09/08 03:23:01	kernel:PGD 203067 PUD 204067 PMD 0 
    
    2009/09/08 03:23:01	kernel:CPU 1 
    
    2009/09/08 03:23:01	kernel:Pid: 1175, comm: bonding Tainted: G      D   2.6.27.10-oe64-00000-g9b2116f #12
    
    2009/09/08 03:23:01	kernel:RIP: 0010:[<ffffffff802961b8>]  [<ffffffff802961b8>] filp_close+0x18/0xa0
    
    2009/09/08 03:23:01	kernel:RSP: 0000:ffff8800ad9d9db8  EFLAGS: 00010246
    
    2009/09/08 03:23:01	kernel:RAX: ffff8800bdbd2810 RBX: ffffffffffffad80 RCX: 0000000000000003
    
    2009/09/08 03:23:01	kernel:RDX: ffff8800bdbd2800 RSI: ffff88013e73a9c0 RDI: ffffffffffffad80
    
    2009/09/08 03:23:01	kernel:RBP: 0000000000000002 R08: ffff8800ad9d8000 R09: 0000000000000000
    
    2009/09/08 03:23:01	kernel:R10: 0000000000000200 R11: 0000000022100000 R12: 0000000000000000
    
    2009/09/08 03:23:01	kernel:R13: ffff88013e73a9c0 R14: 0000000000000001 R15: 0000000000000001
    
    2009/09/08 03:23:01	kernel:FS:  0000000000000000(0000) GS:ffff88013f871dc0(0000) knlGS:0000000000000000
    
    2009/09/08 03:23:01	kernel:CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    
    2009/09/08 03:23:01	kernel:CR2: ffffffffffffada8 CR3: 000000012a5e8000 CR4: 00000000000006a0
    
    2009/09/08 03:23:01	kernel:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    
    2009/09/08 03:23:01	kernel:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    
    2009/09/08 03:23:01	kernel:Process bonding (pid: 1175, threadinfo ffff8800ad9d8000, task ffff88013b798b50)
    
    2009/09/08 03:23:01	kernel:Stack:  0000000000000001 0000000000000002 ffff8801299ad200 ffffffff80237390
    
    2009/09/08 03:23:01	kernel:000000000000000b ffff88013b798b50 0000000000000000 0000000000000000
    
    2009/09/08 03:23:01	kernel:0000000000000000 ffffffff80237a9b ffff88013df8e000 ffff880000000000
    
    2009/09/08 03:23:01	kernel:Call Trace:
    
    2009/09/08 03:23:01	kernel:[<ffffffff80237390>] ? put_files_struct+0x70/0xc0
    
    2009/09/08 03:23:01	kernel:[<ffffffff80237a9b>] ? do_exit+0x17b/0x8b0
    
    2009/09/08 03:23:01	kernel:[<ffffffff8069cf27>] ? oops_end+0x87/0x90
    
    2009/09/08 03:23:01	kernel:[<ffffffff8069cab9>] ? error_exit+0x0/0x51
    
    2009/09/08 03:23:01	kernel:[<ffffffff802961b8>] ? filp_close+0x18/0xa0
    
    2009/09/08 03:23:01	kernel:[<ffffffff802962cc>] ? sys_close+0x8c/0xf0
    
    2009/09/08 03:23:01	kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa
    
    2009/09/08 03:23:01	kernel:
    
    2009/09/08 03:23:01	kernel:
    
    2009/09/08 03:23:01	kernel:Code: ff 66 90 89 f2 be 41 02 00 00 e9 f4 fd ff ff 66 66 66 90 48 83 ec 18 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 45 31 e4 <48> 83 7f 28 00 48 89 f5 74 50 48 8b 47 20 48 85 c0 75 35 48 89 
    
    2009/09/08 03:23:01	kernel:RSP <ffff8800ad9d9db8>
    Code:
    2009/09/08 04:54:02	kernel:CPU 4 
    
    2009/09/08 04:54:02	kernel:Pid: 14719, comm: check_cs Tainted: G      D   2.6.27.10-oe64-00000-g9b2116f #12
    
    2009/09/08 04:54:02	kernel:RIP: 0010:[<ffffffff802961b8>]  [<ffffffff802961b8>] filp_close+0x18/0xa0
    
    2009/09/08 04:54:02	kernel:RSP: 0000:ffff8800bdae7f48  EFLAGS: 00010246
    
    2009/09/08 04:54:02	kernel:RAX: ffffffffffffffef RBX: 004000001a010045 RCX: 0000000000000004
    
    2009/09/08 04:54:02	kernel:RDX: 0000000000000000 RSI: ffff88013d1243c0 RDI: 004000001a010045
    
    2009/09/08 04:54:02	kernel:RBP: 0000000000000004 R08: 0000000000000004 R09: 00000000fffc9488
    
    2009/09/08 04:54:02	kernel:R10: ffff8800bdae6000 R11: 0000000000000000 R12: 0000000000000000
    
    2009/09/08 04:54:02	kernel:R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    
    2009/09/08 04:54:02	kernel:FS:  0000000000000000(0000) GS:ffff88013f8d49c0(0063) knlGS:00000000f7e306c0
    
    2009/09/08 04:54:02	kernel:CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    
    2009/09/08 04:54:02	kernel:CR2: 00000000080905e0 CR3: 00000000bdba3000 CR4: 00000000000006a0
    
    2009/09/08 04:54:02	kernel:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    
    2009/09/08 04:54:02	kernel:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    
    2009/09/08 04:54:02	kernel:Process check_cs (pid: 14719, threadinfo ffff8800bdae6000, task ffff88013d7b3390)
    
    2009/09/08 04:54:02	kernel:Stack:  ffff88013d1243c0 0000000000000004 004000001a010045 ffffffff802962cc
    
    2009/09/08 04:54:02	kernel:0000000000000004 00000000fffc9488 0000000000000000 ffffffff80228282
    
    2009/09/08 04:54:02	kernel:0000000000000000 0000000000000000 0000000000000000 0000000000000000
    
    2009/09/08 04:54:02	kernel:Call Trace:
    
    2009/09/08 04:54:02	kernel:[<ffffffff802962cc>] ? sys_close+0x8c/0xf0
    
    2009/09/08 04:54:02	kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa
    
    2009/09/08 04:54:02	kernel:
    
    2009/09/08 04:54:02	kernel:
    
    2009/09/08 04:54:02	kernel:Code: ff 66 90 89 f2 be 41 02 00 00 e9 f4 fd ff ff 66 66 66 90 48 83 ec 18 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 45 31 e4 <48> 83 7f 28 00 48 89 f5 74 50 48 8b 47 20 48 85 c0 75 35 48 89 
    
    2009/09/08 04:54:02	kernel:RSP <ffff8800bdae7f48>
    
    2009/09/08 04:54:02	kernel:CPU 4 
    
    2009/09/08 04:54:02	kernel:Pid: 14719, comm: check_cs Tainted: G      D   2.6.27.10-oe64-00000-g9b2116f #12
    
    2009/09/08 04:54:02	kernel:RIP: 0010:[<ffffffff802961b8>]  [<ffffffff802961b8>] filp_close+0x18/0xa0
    
    2009/09/08 04:54:02	kernel:RSP: 0000:ffff8800bdae7db8  EFLAGS: 00010246
    
    2009/09/08 04:54:02	kernel:RAX: ffff88013b352010 RBX: db5abe1715001600 RCX: 0000000000000032
    
    2009/09/08 04:54:02	kernel:RDX: ffff88013b352000 RSI: ffff88013d1243c0 RDI: db5abe1715001600
    
    2009/09/08 04:54:02	kernel:RBP: 0000000000000002 R08: ffff88013d8e8ff0 R09: 000013519eba9e10
    
    2009/09/08 04:54:02	kernel:R10: ffff88002802baf0 R11: ffffffff803fde80 R12: 0000000000000000
    
    2009/09/08 04:54:02	kernel:R13: ffff88013d1243c0 R14: 0000000000000001 R15: 0000000000000001
    
    2009/09/08 04:54:02	kernel:FS:  0000000000000000(0000) GS:ffff88013f8d49c0(0000) knlGS:0000000000000000
    
    2009/09/08 04:54:02	kernel:CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    
    2009/09/08 04:54:02	kernel:CR2: 00000000080905e0 CR3: 000000012ae23000 CR4: 00000000000006a0
    
    2009/09/08 04:54:02	kernel:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    
    2009/09/08 04:54:02	kernel:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    
    2009/09/08 04:54:02	kernel:Process check_cs (pid: 14719, threadinfo ffff8800bdae6000, task ffff88013d7b3390)
    
    2009/09/08 04:54:02	kernel:Stack:  0000000000000001 0000000000000002 ffff88013e0f4740 ffffffff80237390
    
    2009/09/08 04:54:02	kernel:000000000000000b ffff88013d7b3390 0000000000000000 0000000000000000
    
    2009/09/08 04:54:02	kernel:0000000000000000 ffffffff80237a9b ffff88013df8e000 ffff880000000000
    
    2009/09/08 04:54:02	kernel:Call Trace:
    
    2009/09/08 04:54:02	kernel:[<ffffffff80237390>] ? put_files_struct+0x70/0xc0
    
    2009/09/08 04:54:02	kernel:[<ffffffff80237a9b>] ? do_exit+0x17b/0x8b0
    
    2009/09/08 04:54:02	kernel:[<ffffffff8069cf27>] ? oops_end+0x87/0x90
    
    2009/09/08 04:54:02	kernel:[<ffffffff8069cab9>] ? error_exit+0x0/0x51
    
    2009/09/08 04:54:02	kernel:[<ffffffff802961b8>] ? filp_close+0x18/0xa0
    
    2009/09/08 04:54:02	kernel:[<ffffffff802962cc>] ? sys_close+0x8c/0xf0
    
    2009/09/08 04:54:02	kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa
    
    2009/09/08 04:54:02	kernel:
    
    2009/09/08 04:54:02	kernel:
    
    2009/09/08 04:54:02	kernel:Code: ff 66 90 89 f2 be 41 02 00 00 e9 f4 fd ff ff 66 66 66 90 48 83 ec 18 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 45 31 e4 <48> 83 7f 28 00 48 89 f5 74 50 48 8b 47 20 48 85 c0 75 35 48 89

    This is the primary machine, the second takes over the shares and the primary system hangs. We have to do a reset to get this machine online again.

    Whereīs the problem?

    Thanks

  2. #2

    Default

    Here are some other error messages we get:

    Code:
    2009/09/08 11:52:08	kernel:CPU 2 
    
    2009/09/08 11:52:08	kernel:Pid: 22750, comm: apache2ctl Tainted: G      D   2.6.27.10-oe64-00000-g9b2116f #12
    
    2009/09/08 11:52:08	kernel:RIP: 0010:[<ffffffff802961b8>]  [<ffffffff802961b8>] filp_close+0x18/0xa0
    
    2009/09/08 11:52:08	kernel:RSP: 0000:ffff88007fb99ea8  EFLAGS: 00010246
    
    2009/09/08 11:52:08	kernel:RAX: ffff88013f24c010 RBX: db5abe1715003e40 RCX: 0000000000000100
    
    2009/09/08 11:52:08	kernel:RDX: ffff88013f24c000 RSI: ffff880129471680 RDI: db5abe1715003e40
    
    2009/09/08 11:52:08	kernel:RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000002
    
    2009/09/08 11:52:08	kernel:R10: 0000000000000008 R11: ffffffff803fde80 R12: 0000000000000000
    
    2009/09/08 11:52:08	kernel:R13: ffff880129471680 R14: 0000000000000001 R15: 0000000000000001
    
    2009/09/08 11:52:08	kernel:FS:  0000000000000000(0000) GS:ffff88013f871740(0000) knlGS:0000000000000000
    
    2009/09/08 11:52:08	kernel:CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    
    2009/09/08 11:52:08	kernel:CR2: 00000000f7f6dacc CR3: 0000000000201000 CR4: 00000000000006a0
    
    2009/09/08 11:52:08	kernel:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    
    2009/09/08 11:52:08	kernel:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    
    2009/09/08 11:52:08	kernel:Process apache2ctl (pid: 22750, threadinfo ffff88007fb98000, task ffff88013d79b890)
    
    2009/09/08 11:52:08	kernel:Stack:  0000000000000001 0000000000000002 ffff88013e76e500 ffffffff80237390
    
    2009/09/08 11:52:08	kernel:0000000000000000 ffff88013d79b890 0000000000000000 0000000000000000
    
    2009/09/08 11:52:08	kernel:0000000000000000 ffffffff80237a9b 00000000fffffff2 0000000000000000
    
    2009/09/08 11:52:08	kernel:Call Trace:
    
    2009/09/08 11:52:08	kernel:[<ffffffff80237390>] ? put_files_struct+0x70/0xc0
    
    2009/09/08 11:52:08	kernel:[<ffffffff80237a9b>] ? do_exit+0x17b/0x8b0
    
    2009/09/08 11:52:08	kernel:[<ffffffff80238244>] ? do_group_exit+0x34/0xa0
    
    2009/09/08 11:52:08	kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa
    
    2009/09/08 11:52:08	kernel:
    
    2009/09/08 11:52:08	kernel:
    
    2009/09/08 11:52:08	kernel:Code: ff 66 90 89 f2 be 41 02 00 00 e9 f4 fd ff ff 66 66 66 90 48 83 ec 18 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 45 31 e4 <48> 83 7f 28 00 48 89 f5 74 50 48 8b 47 20 48 85 c0 75 35 48 89 
    
    2009/09/08 11:52:08	kernel:RSP <ffff88007fb99ea8>
    Code:
    2009/09/08 06:19:54	kernel:CPU 6 
    
    2009/09/08 06:19:54	kernel:Pid: 23675, comm: apache2ctl Tainted: G      D   2.6.27.10-oe64-00000-g9b2116f #12
    
    2009/09/08 06:19:54	kernel:RIP: 0010:[<ffffffff802961b8>]  [<ffffffff802961b8>] filp_close+0x18/0xa0
    
    2009/09/08 06:19:54	kernel:RSP: 0000:ffff88002f4c7ea8  EFLAGS: 00010246
    
    2009/09/08 06:19:54	kernel:RAX: ffff88013d356810 RBX: db5abe1715005cc0 RCX: 0000000000000003
    
    2009/09/08 06:19:54	kernel:RDX: ffff88013d356800 RSI: ffff88013ebd9080 RDI: db5abe1715005cc0
    
    2009/09/08 06:19:54	kernel:RBP: 0000000000000002 R08: ffff88002f4c6000 R09: 0000000000000000
    
    2009/09/08 06:19:54	kernel:R10: ffff880028073af0 R11: ffff88013fa43858 R12: 0000000000000000
    
    2009/09/08 06:19:54	kernel:R13: ffff88013ebd9080 R14: 0000000000000001 R15: 0000000000000001
    
    2009/09/08 06:19:54	kernel:FS:  0000000000000000(0000) GS:ffff88013f9afbc0(0000) knlGS:0000000000000000
    
    2009/09/08 06:19:54	kernel:CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
    
    2009/09/08 06:19:54	kernel:CR2: 00000000f7f6eacc CR3: 0000000000201000 CR4: 00000000000006a0
    
    2009/09/08 06:19:54	kernel:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    
    2009/09/08 06:19:54	kernel:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    
    2009/09/08 06:19:54	kernel:Process apache2ctl (pid: 23675, threadinfo ffff88002f4c6000, task ffff88007fab2790)
    
    2009/09/08 06:19:54	kernel:Stack:  0000000000000001 0000000000000002 ffff88013d07dd00 ffffffff80237390
    
    2009/09/08 06:19:54	kernel:0000000000000000 ffff88007fab2790 0000000000000000 0000000000000000
    
    2009/09/08 06:19:54	kernel:0000000000000000 ffffffff80237a9b 00000000fffffff2 0000000000000000
    
    2009/09/08 06:19:54	kernel:Call Trace:
    
    2009/09/08 06:19:54	kernel:[<ffffffff80237390>] ? put_files_struct+0x70/0xc0
    
    2009/09/08 06:19:54	kernel:[<ffffffff80237a9b>] ? do_exit+0x17b/0x8b0
    
    2009/09/08 06:19:54	kernel:[<ffffffff80238244>] ? do_group_exit+0x34/0xa0
    
    2009/09/08 06:19:54	kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa
    
    2009/09/08 06:19:54	kernel:
    
    2009/09/08 06:19:54	kernel:
    
    2009/09/08 06:19:54	kernel:Code: ff 66 90 89 f2 be 41 02 00 00 e9 f4 fd ff ff 66 66 66 90 48 83 ec 18 48 89 1c 24 48 89 6c 24 08 48 89 fb 4c 89 64 24 10 45 31 e4 <48> 83 7f 28 00 48 89 f5 74 50 48 8b 47 20 48 85 c0 75 35 48 89 
    
    2009/09/08 06:19:54	kernel:RSP <ffff88002f4c7ea8>

  3. #3

    Default

    Could be problems with memory
    what version and how much RAM do you have in your system?

    May also want to send logs to support to see what they say.
    keep us posted

  4. #4

    Default

    Have 4 GB in each machine.
    I change the memory and try it again.

  5. #5

    Default

    What did support say about the logs?
    did you try to run a memory test?
    Is it only happening on the primary machine ?

  6. #6

    Default

    Havenīt sent the logs to the support yet.
    A memory test did not show any errors.
    And yes, itīs only at the primary machine.

  7. #7

    Default

    What build of DSS 6 are you running? We were running build 3535 and seeing similar issues were our SAN would act like it just froze up. Support had us upgrade to build 3537 and problem solved. The issue was with the system cache. A reboot would fix it for a while and them it would freek out again. High data transfer agrivates this issus.

  8. #8

    Default

    6.0up06.8102.3535 64bit

    Like I said earlier, support sent me a patch file which I haven't had the downtime to implement yet. I'll try to get that done this weekend.

  9. #9

    Default

    We are running 6.0up04.8101.3530 64bit
    Are there any updates i can donwload and install?

  10. #10

    Default

    We have reinstalled and reconfigured the primary server and it seems that the errors are gone now.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •