Log errors

Printable View

12-08-2011, 11:33 AM
mikep

Log errors

Hello, we had some VM going to RO today, I've searched logs and I can found this:

Thu 08Dec2011 (tz=-60)
09:56:15.28017000 JNI: ArcAdapter.cpp: Line #: 5021
JNI: getControllerLog()
StorLib::getControllerLog(char *)
ArcAdapter::getControllerLog()
ArcAdapter::getAdapterStatsLog()
*** FSA API Error: FsaGetControllerStats() fsaStatus=479 ***
Thu 08Dec2011 (tz=-60)
09:56:15.159757000 JNI: ArcAdapter.cpp: Line #: 3203
JNI: getEvents()
StorLib::getEvents(char**)
ArcAdapter::getEvents()
*** FSA API Error: FsaPollForEvent(x,x,x) fsaStatus=611 ***

2011/12/08 09:56:15|I/O Errors detected on unit S004. The unit requires your urgent attention in order to decrease the risk of data loss.

Dec 8 09:56:15 [kern.info] kernel: [978330.321409] sd 4:0:4:0: [sde] Result: hostbyte=0x00 driverbyte=0x06
Dec 8 09:56:15 [kern.err] kernel: [978330.321415] end_request: I/O error, dev sde, sector 1330144712
Dec 8 09:56:15 [kern.info] kernel: [978330.321421] dev_vdisk: ***ERROR***: cmd ffff88020188da28 returned error -5
Dec 8 09:56:15 [kern.info] kernel: [978330.321450] sd 4:0:4:0: [sde] Result: hostbyte=0x00 driverbyte=0x00
Dec 8 09:56:15 [kern.err] kernel: [978330.321454] end_request: I/O error, dev sde, sector 1330144720
Dec 8 09:56:15 [kern.info] kernel: [978330.321498] dev_vdisk: ***ERROR***: cmd ffff8801a0e49110 returned error -5
Dec 8 09:56:15 [auth.info] CRON[10740]: (pam_unix) session closed for user root
Dec 8 09:56:15 [local3.err] System: I/O Errors detected on unit S004. The unit requires your urgent attention in order to decrease the risk of data loss.
Dec 8 09:56:15 [authpriv.notice] sudo: root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/var/nasexe/event.send NewErrors
Dec 8 09:56:15 [auth.info] CRON[10737]: (pam_unix) session closed for user root

This only lasted 2 minutes. I've checked all volumes in ASM and are all optimal. I cannot find any event too. No disks offline also, no hotspares offline.
Any idea what caused this?

Regards
12-08-2011, 12:24 PM
Al-S

Check your RAID Controller health.
Check your File System with FSR as in this link:
http://kb.open-e.com/File-system-repair_138.html
12-08-2011, 03:10 PM
mikep

Thanks Al-S.

Regards
12-09-2011, 01:40 PM
mikep

Looking again at the logs I have this error, that was causisng VMs on that hypervisor went to RO:

Dec 8 09:53:56 [kern.info] kernel: [978191.413459] iscsi-scst: ***ERROR***: Connection with initiator iqn.1994-05.com.redhat:eba5969b759d unexpectedly closed!

But this is before the I/O error at:

2011/12/08 09:56:15|I/O Errors detected on unit S004. The unit requires your urgent attention in order to decrease the risk of data loss.

Is the same error? second one was reported later but is the root of the first one?

Regards
Carlos Luna
12-09-2011, 01:55 PM
mikep

Aso, in that link says:

Select Repair File System on LV (Logical Volumes)
Please note: when running file system repair the volumes will be unmounted and the shares not available for use.

This means all SAN volumes or just the one to be repaired?

Regards
12-09-2011, 02:06 PM
Al-S

For your question about :
Dec 8 09:53:56 [kern.info] kernel: [978191.413459] iscsi-scst: ***ERROR***: Connection with initiator iqn.1994-05.com.redhat:eba5969b759d unexpectedly closed!

This erros means that the connection between your initiator and your system is closed, and as you can see "unexpectedly" that means your disk is stop working, or the RAID controller hangged or report errors, and finding such error with your previous erros note is not strange, as the preivous erros can close the connection, so this error message can appear.

For your question about:
"Select Repair File System on LV (Logical Volumes)
Please note: when running file system repair the volumes will be unmounted and the shares not available for use."

Well this will affect all the volumes that your DSS is dealing with.