Visit Open-E website
Page 1 of 9 123 ... LastLast
Results 1 to 10 of 85

Thread: vSphere 4 - Support on the Way?

  1. #1
    Join Date
    Apr 2009
    Posts
    62

    Default vSphere 4 - Support on the Way?

    Hi there,

    Anyone have any idea as to a timeline when we might expect to see some compatability with VMWare vSphere4?

    I recently upgraded ESX 3.5 to ESX 4 to test its compatability with DSS and it seems that they re-wrote their iSCSI stack code and there is some funky stuff going on.

    I get dropped connections to LUNs and many path failures and reservation conflicts on the 4.0 server. When it occurs and the LUN mapping is dropped, the ESX 3.5 loses connection to it as well.

    I can help the cause by providing Open-E access to my production ESX4 and DSS servers, just let me know.

    1parkplace

  2. #2

    Default

    Thanks for providing us the use of your systems; we may need to back into that later. I did look at your logs and would like to verify some things concerning your Bond. Looks like Eth 2 and 3 have different speeds sets. This can cause issues and are these NIC's the same in terms of chips sets? We noted in the release notes that when using bonding to run a stress test with different chipset as we have seen issues also the speed difference can do this as well. Check the switch or force the speed in the Console from the Console Tools menu in the Modify driver settings. Then test again. Also you might want to change the iSCSI Daemon settings from the link below as I see PDU issues from the dmesg logs. More information about PDU can be found by Google.


    This happens for LUN 1,2 and 3
    iscsi_trgt: data_out_start(1037) unable to find scsi task 4f1b11f 8a93
    iscsi_trgt: cmnd_skip_pdu(454) 4f1b11f 1e 0 4096

    Try to set the in the console the following:
    Ctl-Alt-W
    Select Tuning Options
    Select iSCSI Daemon options
    Select Target Options
    Select a target
    Set MaxRecvDataSegmentLength and MaxXmitDataSegmentLength to 65536.

    or try these settings:

    maxRecvDataSegmentLen=262144
    MaxBurstLength=16776192
    Maxxmitdatasegment=262144
    maxoutstandingr2t=8
    InitialR2T=No
    ImmediateData=Yes
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  3. #3
    Join Date
    Apr 2009
    Posts
    62

    Default

    hey todd,

    The bond your seeing is strictly for the WebGUI for the DSS.

    The iSCSI connection are only occuring on the eth1 which is the HP NC510C 10GBe NIC.

    Should I set the following setting still or is this a compund issue?

    Thanks!

    Drew

  4. #4

    Default

    If you dont mind, let's kill the bond for now. Also make the changes for the iSCSI daemon settings as well.

    I want to make sure there is nothing in our way in researching this issue. If there is anything else you can tell please do to help us.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  5. #5
    Join Date
    Apr 2009
    Posts
    62

    Default

    Response from someone in the VMWare community regarding my uploaded VMKernel Logs:
    __________________________________________________ _______
    There seems to be an issue with storage:


    Jun 3 01:01:18 vmhost-1 vmkernel: 3:15:42:27.023 cpu4:4239)WARNING: iscsi_vmk: iscsivmk_TaskMgmtAbortCommands: vmhba33:CH:0 T:1 L:2 : Abort task response indicates task with itt=0x1107006 has been completed on the target but the task response has not arrived ...
    Jun 3 01:01:18 vmhost-1 vmkernel: 3:15:42:27.272 cpu4:4239)WARNING: iscsi_vmk: iscsivmk_ConnSetupScsiResp: vmhba33:CH:0 T:1 CN:0: Task not found: itt 17854470


    17854470(dec)=110700(hex)


    1- There is an IO timeout (i.e storage is not responding to IO on time) which cases ESX iSCSI initiator to send an abort for that IO.


    2- It appears that the storage responds to that with "task does not exist" but later the storage sends response the IO task. That is in violation of iSCSI protocol and ESX initiator drops the connection. This seems to keep happening very often.


    ESX 3.5 s/w iscsi initator would just ignore that case but ESX 4 initiaror is very strict about protocol violation.


    It appears you are using Open-E DSS, I do not think it is certified with ESX4 yet. Could you post the version of DSS you are using ?.

    __________________________________________________ ____

  6. #6

    Default

    Thanks for the looking into this - I have forwarded this to engineers to verify. We are watching this thread. Also send me new logs once the changes have been made from the tuning options.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  7. #7
    Join Date
    Apr 2009
    Posts
    62

    Default

    Tuning Configuration is complete.

    New log packages sent to your email.

    Thanks!

    Drew

  8. #8

    Default vSphere compatibilty needed ASAP

    Drew,

    We too are facing a similar scenario; however we have not upgraded our ESX environment to vSphere 4.0 as I noticed they were not on the HCL for vSphere 4.0 Storage. However I really want to be able to upgrade to vSphere 4.0 so I am hoping they can resolve your issue and get recertified. However, I was not to optimistic on the speed of this happening as I was told by support that they may look at testing and recertifying sometime later this year, but as of yet no testing had been done with vSphere 4.0. I am going to continue to monitor your thread for progress resolving this issue. Hopefully since you already on vSphere 4 and need DSS to be compatible it doesn’t take until the end of the year to get that.

  9. #9

    Default

    Drew,

    I am also curious what version of DSS you are running?

    Were you experiencing any iSCSI timeout errors when using ESX 3.5?

    What RAID level are you running on your DSS SAN?

    We had fought iSCSI timeout issues with DSS for many months. I tried configuring the DSS SAN with all of the recommended perfomance settings as specified by Open-e support. Things got a little better, but when the SAN was under moderate to heavy load we would still get CMD Abort and Task Not Found errors. What I found to be our ultimate solution was to rebuild our RAID set as RAID 10. Previously I was using RAID 5 and had my VMs split accross 2 DSS SANs, so each was had 50% of the ESX load. Now with the SANs at RAID 10, I am able to run ALL of my VMs on 1 SAN with better performance and NO iSCSI timeouts or errors, under any load.

  10. #10
    Join Date
    Apr 2009
    Posts
    62

    Default

    I'm running the 3513 build of DSS. I have two servers which I run DSS on, each server has 3 RAID configs and I map their LUNs to a single target. I have, on each server, different RAID setups ranging from RAID1, RAID5, RAID 10.

    I also run these targets through a 10GbE Network, so maybe this is why I never ran into timeouts, are you using a similar setup?

    On ESX3.5u3 I was not getting this same issue... however, I was experiencing finicky connection drops due to the use of the NetXen 10GbE NICs in my systems.

    This weekend I will be installing Intel 2xCX4 10GbE NICs in my machines to alleviate that problem.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •