I've had a running issue, that i just can't seem to get to the bottom of, on average once every 10 days the system will just halt and restart, its always between 01:00 and 04:00 in the morning.
the only errors i get are from the system saying
"2012/08/11 01:32:29 System:The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run "Repair file system on LV (Logical Volumes)" in Console Tools (ALT+CTRL+X) to avoid possible data corruption."
its driving me up the wall, i cant seem to find anything thats happening at that time, no power issues (the ups reports nothing) no networking issues, i've had the system on a USB key, HDD and currently its on a brand new SD card (with a ide to SD converter)
any ideas, its starting to cause issues, with the backup system that uses the storage as iscsi
Check the RAM, with memtest from boot menu.
To run the memory test, restart your server and right after the system performs POST, hit 'Esc' and you'll get options to run the test.
I suspect hardware or a driver issue.
Check over fans/temps, and power supplies.
And, be sure all firmware is up to date, including MB BIOS.
If you want to PM me the system logs, I can take a look to see if anything is obvious.
I believe all the firmware is the latest version, but also if it was a firmware type issue (or hardware at all) i would expect whatever condition caused the failure to happen at least once in 3 months during the day and not always at night.
i will check the powersupplies etc... but there all connected to the UPS and as i said that hasn't reported any issues, all of them seem to work.
Thanks for the logs.
I don't see any issue that can be causing the shutdown.
You do have small problem with ADS connection, but this is likely related to the time difference between the DSS and the ADS server, or domain admin account is not the real domain administrator but a user with admin rights.
Check over the UPS and be sure it is not shutting down the systems itself.
Hi thanks for checking the UPS is actually not currently "managing" system shutdowns so it can't be that.
do the logs reflect a "crash" i.e. a oldschool bluescreen dump or simply the system stopping, as i said i need to get to the bottom of this as its getting crazy
try changing the RAID controller and see if it still fails:
===
[ 97.362565] hdb: task_pio_intr: status=0x51 { DriveReady SeekComplete Error }
[ 97.362576] hdb: task_pio_intr: error=0x04 { DriveStatusError }
[ 97.362583] hdb: possibly failed opcode: 0xa1
2012/08/10 02:49:53|The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run \"Repair file system on LV (Logical Volumes)\" in Console Tools (ALT+CTRL+X) to avoid possible data corruption.
2012/08/11 01:32:29|The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run \"Repair file system on LV (Logical Volumes)\" in Console Tools (ALT+CTRL+X) to avoid possible data corruption.
2012/08/12 00:45:27|The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run \"Repair file system on LV (Logical Volumes)\" in Console Tools (ALT+CTRL+X) to avoid possible data corruption.
===