iSCSI Failover Woes

**jrpereira** · 11-02-2011, 05:26 PM

Been getting quite a few instances of.

iSCSI Failover: Secondary node: Serivice is under heavy load or the time values you have set are too low (interval 98570 ms). Please see Failover Configuration help for further details.

Followed by a drop of the resources only to be picked up in a few.

We have two OpenE SANS in replication with iSCSI failover configured. We have a windows failover cluster connecting to the virtual IP. When this happens it causes our VMs to crash.

Any hints?

Also, we have an quite in increase in "Cannot ping ping target" from the SAN that is active since the load has come up, but the load is well below the NIC's traffic limit.

**Gr-R** · 11-02-2011, 05:34 PM

A few suggetions:

1- make sure replication is on a dedicated link, not the same IPs as iSCSI traffic.
2- tune the targets for performance.

These settings work well for most VM hosts:

1. From the console, press CTRL+ALT+W,
2. Select Tuning options ->
iSCSI daemon options -> Target options,
3. Select the target in
question,
4. change the MaxRecvDataSegmentLength and MaxXmitDataSegmentLength
values to the
maximal required data size (check w/ the initiator to
match).

maxRecvDataSegmentLen=262144
MaxBurstLength=16776192
Maxxmitdatasegment=262144
FirstBurstLength=65536
DataDigest=None
maxoutstandingr2t=8
InitialR2T=No
ImmediateData=Yes
headerDigest=None
Wthreads=8

3- expand the timewait settings in the failover configuration, and make sure ping times to the ping nodes are fast.
4- Use jumbo frames

**jrpereira** · 11-03-2011, 03:16 AM

Thank you for the reply.

Also, having this issue:

2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: Link eth1 is down. Second node is down or there are connection problems.

2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: Link eth2 is down. Second node is down or there are connection problems.

2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: This node took over all resources - failover completed. Status is now active.

2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: This node took over all resources - failover completed. Status is now active.

The replication is taking place on a dedicated NIC, and another NIC is used for the the iSCSI targets. It seems when the primary node loses contact with the secondary it drops the virtual IP for a few seconds then picks it back up again, it is crashing my VMs when this happens. Am I reading this correctly?

**To-P** · 11-03-2011, 08:51 AM

I'm afraid that you may have some connection problems. It would be good to check if there are any dropped packets on your NICs.
Please download the logs from your system, uncompress them and check the test.log file. You will find there the output from "ifconfig -a" command where you can see how many errors and dropped packets are on each of your NIC.

Thread: iSCSI Failover Woes

Thread Tools

Display

iSCSI Failover Woes

Posting Permissions