Error messages and system hangs

Printable View

Show 40 post(s) from this thread on one page

09-15-2009, 02:23 PM
symm

Hey jiassic and chkohlruss

the errors that you posted are different,
jiassic may be drive related
he is showing these:
2009/08/31 00:03:01|[<ffffffff8069c592>] ? __down_read+0x12/0xa0
2009/08/31 00:03:01|[<ffffffff804256c8>] ? __down_write_trylock+0x48/0x60
2009/08/31 00:03:01|[<ffffffff804256c8>] ? __down_write_trylock+0x48/0x60
2009/08/31 00:03:01|[<ffffffffa001d663>] ? di_read_unlock+0x73/0x130 [aufs]
2009/08/31 00:03:01|[<ffffffffa001c497>] ? h_d_revalidate+0x4c7/0x6b0 [aufs]
2009/08/31 00:03:01|[<ffffffff8069c592>] ? __down_read+0x12/0xa0
2009/08/31 00:03:01|[<ffffffff804256c8>] ? __down_write_trylock+0x48/0x60
2009/08/31 00:03:01|[<ffffffff804256c8>] ? __down_write_trylock+0x48/0x60
2009/08/31 00:03:01|[<ffffffff80425701>] ? __up_read+0x21/0xb0
2009/08/31 00:03:01|[<ffffffffa001d663>] ? di_read_unlock+0x73/0x130 [aufs]

while chkohlruss
you have different errors
2009/09/08 06:19:54 kernel:[<ffffffff80237390>] ? put_files_struct+0x70/0xc0

2009/09/08 06:19:54 kernel:[<ffffffff80237a9b>] ? do_exit+0x17b/0x8b0

2009/09/08 06:19:54 kernel:[<ffffffff80238244>] ? do_group_exit+0x34/0xa0

2009/09/08 06:19:54 kernel:[<ffffffff80228282>] ? ia32_sysret+0x0/0xa

Best is to send the entire logs to suport to be sure.
Did you guys hear back from them?
09-15-2009, 03:45 PM
jisaac

Support sent me a patch file, which I believe is an update to SCST (version 1.0.1.1). I have not actually implemented the patch due to the necessity of shutting down all of my VM's to do the upgrade. That's a pretty hefty maintenance window to try to schedule.
09-15-2009, 06:46 PM
jisaac

Symm, thanks for the info. How do you find out what these error messages refer to?

I looked a little deeper and found that our RAID controller (3ware 9690SA) does a scheduled verify every night at 12:00 AM, and the event log in the RAID controller shows that the job kicks off at 12:03 - the same time as the error message timestamp above. It does this every night. Is it possible that the increased drive usage as it verifies the RAID5 parity information would cause the DSS error?
09-17-2009, 10:58 AM
chkohlruss

Support says it is the RAM.
We´ve done several memtests, no errors found.
We´ve change the RAM with a new one, still got the error messages.
09-17-2009, 06:11 PM
symm

jisaac

errors that I have seen in past life (telecom)
we used linux and raid for storage
09-17-2009, 07:16 PM
cschiff

What build of DSS 6 are you running? We were running build 3535 and seeing similar issues were our SAN would act like it just froze up. Support had us upgrade to build 3537 and problem solved. The issue was with the system cache. A reboot would fix it for a while and them it would freek out again. High data transfer agrivates this issus.
09-18-2009, 05:17 AM
jisaac

6.0up06.8102.3535 64bit

Like I said earlier, support sent me a patch file which I haven't had the downtime to implement yet. I'll try to get that done this weekend.
09-18-2009, 07:27 AM
chkohlruss

We are running 6.0up04.8101.3530 64bit
Are there any updates i can donwload and install?
09-18-2009, 01:39 PM
chkohlruss

We have reinstalled and reconfigured the primary server and it seems that the errors are gone now.

Show 40 post(s) from this thread on one page