I've got Auxillary connection set on the Virtual IP interfaces, Management interfaces, and the replication intefaces.
I noticed on the Secondary Node that I have eth0 and eth3 set as bonds. I'm going to make sure that they are on the same chipsets. THe Secondary Node has 2 double nic cards, but the numbering may not be as expected. Eth0 might be on port 1 and then eth 2 and 3 an the other and then eth 4 on the first card again. I"m actually pretty sure this is the way because the addon card has both ports connected by crossover.
2. The Primary node was showing more than one iscsi connection per lun per client. (Under Maintenance connections). Now it only has one iscsi connection per lun and failover works.
Question:
Is there a way to stop and restart just SCST service and not failover?
Is there a "safe" way to shutdown the primary node in case it does not want to complete failover?
I've noticed the vmware/lun locking tends up happen when very disk intensive operations are attempted.
I believe the November problem happened when I was experimenting with an offsite backup job. The backup job succeeded. Very soon after the job finished the logs stop.
The outage in mid December happened when I was trying to move an entire VM through the Vsphere Client.
I am suspecting hp's firmware. There are some updates to be applied.
It looks like their smart array controller does not play nicely with sata disks.
The confusing part is that they have two different revision numbers for the firmware update. It could be based on the HW revision of the controller: I'll try searching by S/N and see if it is more specific. if not, I'll do a chat session with HP support.
I'll also see if I can budget for all sas disks and rebuild the array/re-install Open-e.
FYI the snapshots I deleted were vm's, not open-e snapshots.