We have had an issue that when we failover to one node for maintenance or an actual failure, some of the vm's on the storage become unresponsive and we have to restart the EXSi host to get the VM to work correctly. Is there any settings that we need to change on the ESX hosts or the VMs
We have 4 x 10GB Nics with a total of 8 VIPS 4 for node A resources and 4 for Node B resources. Esx hosts are using MPIO round robin, each host has 4 x10GB Nics, each nic connects to 1 x VIP for Resources on node a and 1 x VIP for resources on node B. All VIPS are in the initiator
Is this setup in a VLAN? Can the ESX servers ping the VIPs and the IPs of the NICs under the VIP of the other DSS server? Not sure if you can perform this or if you can have a maintenance window to remove all the ESX host initiators target VIPs and re-enter them again and try to move the resources over again to see if the VMs still are active.
Yes all VIPs are in their own vlan and the esx hosts can ping all VIP's, unfortunately we can remove it all and ad back as it is in production. We cant afford to test the auto failover in case it freezes some servers again. It has happened both times we had an auto failover, also when we performed manual failover to perfrom updates
Any chance you can send us the log files from your User Portal so we can look at the setup or if there are any issues that we can identify? If using Fiber Channel we dont have Auto Failover for Active Passive with Virtual IPs. We will have this in our JovianDSS product later this year.