Visit Open-E website
Results 1 to 4 of 4

Thread: iSCSI Failover Woes

  1. #1

    Default iSCSI Failover Woes

    Been getting quite a few instances of.

    iSCSI Failover: Secondary node: Serivice is under heavy load or the time values you have set are too low (interval 98570 ms). Please see Failover Configuration help for further details.

    Followed by a drop of the resources only to be picked up in a few.

    We have two OpenE SANS in replication with iSCSI failover configured. We have a windows failover cluster connecting to the virtual IP. When this happens it causes our VMs to crash.

    Any hints?

    Also, we have an quite in increase in "Cannot ping ping target" from the SAN that is active since the load has come up, but the load is well below the NIC's traffic limit.

  2. #2
    Join Date
    Oct 2010
    Location
    GA
    Posts
    935

    Default

    A few suggetions:

    1- make sure replication is on a dedicated link, not the same IPs as iSCSI traffic.
    2- tune the targets for performance.

    These settings work well for most VM hosts:
    1. From the console, press CTRL+ALT+W,
    2. Select Tuning options ->
    iSCSI daemon options -> Target options,
    3. Select the target in
    question,
    4. change the MaxRecvDataSegmentLength and MaxXmitDataSegmentLength
    values to the
    maximal required data size (check w/ the initiator to
    match).

    maxRecvDataSegmentLen=262144
    MaxBurstLength=16776192
    Maxxmitdatasegment=262144
    FirstBurstLength=65536
    DataDigest=None
    maxoutstandingr2t=8
    InitialR2T=No
    ImmediateData=Yes
    headerDigest=None
    Wthreads=8
    3- expand the timewait settings in the failover configuration, and make sure ping times to the ping nodes are fast.
    4- Use jumbo frames

  3. #3

    Default

    Thank you for the reply.

    Also, having this issue:

    2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: Link eth1 is down. Second node is down or there are connection problems.

    2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: Link eth2 is down. Second node is down or there are connection problems.

    2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: This node took over all resources - failover completed. Status is now active.

    2011/11/02 12:28:03 Failover:iSCSI Failover: Primary node: This node took over all resources - failover completed. Status is now active.

    The replication is taking place on a dedicated NIC, and another NIC is used for the the iSCSI targets. It seems when the primary node loses contact with the secondary it drops the virtual IP for a few seconds then picks it back up again, it is crashing my VMs when this happens. Am I reading this correctly?

  4. #4
    Join Date
    Jan 2011
    Posts
    54

    Default

    I'm afraid that you may have some connection problems. It would be good to check if there are any dropped packets on your NICs.
    Please download the logs from your system, uncompress them and check the test.log file. You will find there the output from "ifconfig -a" command where you can see how many errors and dropped packets are on each of your NIC.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •