Visit Open-E website
Page 6 of 9 FirstFirst ... 45678 ... LastLast
Results 51 to 60 of 85

Thread: vSphere 4 - Support on the Way?

  1. #51

    Default

    (Accidentally posted this in the wrong thread...)


    Todd -
    Thanks again for taking time last night to help me out. I have done what you suggested and killed my bond, changed the IP of the iscsi interface to a different subnet, and then setup the virtual IP again.

    Now, what I noticed was the speed tests were a little more consistent with each other after about 20 minutes (a lot of traffic initially because the VM's were down for about an hour). Of course, if I ran more than one test at a time the gig iscsi interface on the SAN couldn't keep up. I was seeing an average of ~105MB/s seq read, ~55MB/s seq write on both the RAID6 external lun and the RAID10 internal LUN.

    I then created a balance-rr bond with all 4 of the Intel GIG iscsi interfaces. The tests were a little better than the 802.3ad, but not as "bursty." Once again, the speeds vary from lun to lun and when they are run. I have seen the low 90's on the read and even in the 70's on the writes. I think were getting closer. I will let it run more during the day before sending some logs. I have uploaded the logs from both servers when it was in a single, no bond, gig interface.

    I'll keep you posted and possibly get some performance numbers once I can find some consistency.

    http://upload.vmind.com/web-pub/cera...6-18_00-43.zip

    Thanks again!

    Jason

  2. #52

    Default

    Jason - Thanks for the update, seems we are getting closer. The bonds will start kicking in when there are many systems or requests coming then it will do better. You can also try the ALB bond. I heard that is doing well. Looks like we will work on the 802.3ad now.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  3. #53
    Join Date
    Jan 2008
    Posts
    82

    Default

    Hello,

    I had similar problem, and it was a hardware issue. I tried changed the IETD iSCSI Target and it gets better.

    Quote Originally Posted by 1parkplace
    Response from someone in the VMWare community regarding my uploaded VMKernel Logs:
    __________________________________________________ _______
    There seems to be an issue with storage:


    Jun 3 01:01:18 vmhost-1 vmkernel: 3:15:42:27.023 cpu4:4239)WARNING: iscsi_vmk: iscsivmk_TaskMgmtAbortCommands: vmhba33:CH:0 T:1 L:2 : Abort task response indicates task with itt=0x1107006 has been completed on the target but the task response has not arrived ...
    Jun 3 01:01:18 vmhost-1 vmkernel: 3:15:42:27.272 cpu4:4239)WARNING: iscsi_vmk: iscsivmk_ConnSetupScsiResp: vmhba33:CH:0 T:1 CN:0: Task not found: itt 17854470


    17854470(dec)=110700(hex)


    1- There is an IO timeout (i.e storage is not responding to IO on time) which cases ESX iSCSI initiator to send an abort for that IO.


    2- It appears that the storage responds to that with "task does not exist" but later the storage sends response the IO task. That is in violation of iSCSI protocol and ESX initiator drops the connection. This seems to keep happening very often.


    ESX 3.5 s/w iscsi initator would just ignore that case but ESX 4 initiaror is very strict about protocol violation.


    It appears you are using Open-E DSS, I do not think it is certified with ESX4 yet. Could you post the version of DSS you are using ?.

    __________________________________________________ ____

  4. #54

    Default

    Quote Originally Posted by masim
    Hello,

    I had similar problem, and it was a hardware issue. I tried changed the IETD iSCSI Target and it gets better.
    Do you mean you changed from SCST to IETd? Or from IETd to SCST ?? And what got better? I am using SCST and I am not having the timeout issues, just slow performance.

    Todd -
    I can try and change to balance-alb, but I think in the past I have experienced "port-flapping" on our Cisco switches where it gets confused on which interface to use, but I can certainly give it a shot. Do we maybe want to try some of those target settings you suggested initially? I will also forward the logs from today running with balance-rr.

    Thanks again,
    Jason

  5. #55

    Default

    Jason - You are correct let's go with the other settings and abort the bond change. Thanks for the update.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  6. #56

    Default

    Quote Originally Posted by To-M
    maxRecvDataSegmentLen=262144
    MaxBurstLength=16776192
    Maxxmitdatasegment=262144
    maxoutstandingr2t=8
    InitialR2T=No
    ImmediateData=Yes

    Todd -
    I tried these settings again on my V6 beta SAN's, and once again all ESX servers (3.5 and 4) couldn't reconnect. I then tried to set them one at a time to see which setting caused the problem, and it's the MaxBurstLength setting. Every other setting allows the initiators to reconnect. Any other value I should try? Also, I have the logs from the balance-rr and you can download them below.

    Thanks,
    Jason

    http://upload.vmind.com/web-pub/cera...6-18_21_16.zip

  7. #57

    Default

    Engineers are investigating the bond issue as this is where they think the problem is also they are looking into the maxburst issue as well. I would leave the defaults for now. They will have to get some update for V6 to solve this issue with SCST.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  8. #58

    Default

    Todd -
    Sounds good. I will wait for advice then. It seems to be running OK now and not dropping machines, but the performance is still a little less to be desired.

    If you need anything else from me let me know.

    Thanks again for your help.

    Jason

  9. #59

    Default

    Todd -
    I have been running with the v6 beta (update1) in the balance-rr mode since we last spoke, and the virtual machines are no longer dropping in vSphere 4. The performace is still a little on the middle tier, but at lease I have the stability problems resolved. I am getting on average 50-60MB/s read and 40-50MB/s write on both the RAID10 and RAID6 LUNS within VMware.

    Is there anything else you would like me to try or change? I can also upload logs from both systems if you would like. I have roughly 38 days left on the beta demo. Any word on the final release?

    Thanks again,
    Jason

  10. #60
    Join Date
    Apr 2009
    Posts
    62

    Default

    Hey Guys,

    I think I have my issues resolved as well. Our issue turned out to be the NetXen-based 10GbE NICs. They were HP NC510C cards (Single Port CX4).

    Once these cards were replaced with Intel 82598EB-based cards by SuperMicro the LUN dropping has disappeared and all VMKernel logs and DSS logs look clean and clear.

    I will keep you guys posted with test results once I get my balance-rr setup on the 2-port NICs.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •