-
High load 90+
Hello,
We have a problem with one of our SAN in active/active. The load is constantly: 98 and the log gives a lot of errors.... Connected LVM's can use the storage but new connections are not accepted any more. What can we do? How can i safely reboot the server?
-
You should investigate why such a high load. Check your raid controller.
-
All VM's are running well on the box, how can i safely restart the high load server without disrupting service? Would you be able to look at the box? How can i start f.e. Top? The Raidcard seems wel...
Could this be ACPI? or something bios related?
I see these errors in the log:
2013/11/29 19:12:06 [47512.717095] ------------[ cut here ]------------
2013/11/29 19:12:06 [47512.717112] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
2013/11/29 19:12:06 [47512.717114] CPU 3
2013/11/29 19:12:06 [47512.717141] Pid: 30924, comm: scsi_tm Not tainted 2.6.35.14-oe64-00000-ge4ce801 #2 X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+
2013/11/29 19:12:06 [47512.717144] RIP: 0010:[] [] scst_sync_confirm_slave_mgmt_cmd_exec+0x1e0/0x1f0 [scst]
2013/11/29 19:12:06 [47512.717164] RAX: ffffffffa021ad30 RBX: ffff8810763abb88 RCX: ffff881075e4cd08
2013/11/29 19:12:06 [47512.717167] RDX: 0000000000000006 RSI: 00000000ffffff01 RDI: ffff88086cfd2d80
-
If you using the Active / Active you can move the resources over to he other node to drop down the server. Not sure about the ACPI though if you drop the server down go in the BIOS of the motherboard and disable it and all other power features.
-
Can the move be done with a running server and Vm's on them? Will the cluster automatically start again when the reboot is finished?
-
Yes, you can move the resources without stopping your VMs.
Once the machine is back online, it should join the cluster again.
-
We perform the failover this night but we are now stuck migration is waiting for nothing, a load of 130. We are getting frustrated.
-
After a forced powerdown and restart all returned to normal again luckely
-
Can you log a ticket with our Support Team and attach the log files?
We will check what caused the problem.