I suspect hardware or a driver issue.
Check over fans/temps, and power supplies.
And, be sure all firmware is up to date, including MB BIOS.
If you want to PM me the system logs, I can take a look to see if anything is obvious.
I believe all the firmware is the latest version, but also if it was a firmware type issue (or hardware at all) i would expect whatever condition caused the failure to happen at least once in 3 months during the day and not always at night.
i will check the powersupplies etc... but there all connected to the UPS and as i said that hasn't reported any issues, all of them seem to work.
Thanks for the logs.
I don't see any issue that can be causing the shutdown.
You do have small problem with ADS connection, but this is likely related to the time difference between the DSS and the ADS server, or domain admin account is not the real domain administrator but a user with admin rights.
Check over the UPS and be sure it is not shutting down the systems itself.
Hi thanks for checking the UPS is actually not currently "managing" system shutdowns so it can't be that.
do the logs reflect a "crash" i.e. a oldschool bluescreen dump or simply the system stopping, as i said i need to get to the bottom of this as its getting crazy
try changing the RAID controller and see if it still fails:
===
[ 97.362565] hdb: task_pio_intr: status=0x51 { DriveReady SeekComplete Error }
[ 97.362576] hdb: task_pio_intr: error=0x04 { DriveStatusError }
[ 97.362583] hdb: possibly failed opcode: 0xa1
2012/08/10 02:49:53|The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run \"Repair file system on LV (Logical Volumes)\" in Console Tools (ALT+CTRL+X) to avoid possible data corruption.
2012/08/11 01:32:29|The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run \"Repair file system on LV (Logical Volumes)\" in Console Tools (ALT+CTRL+X) to avoid possible data corruption.
2012/08/12 00:45:27|The system was not shutdowned properly. It could lead to file system corruption. It is recommenced to run \"Repair file system on LV (Logical Volumes)\" in Console Tools (ALT+CTRL+X) to avoid possible data corruption.
===