We too are facing a similar scenario; however we have not upgraded our ESX environment to vSphere 4.0 as I noticed they were not on the HCL for vSphere 4.0 Storage. However I really want to be able to upgrade to vSphere 4.0 so I am hoping they can resolve your issue and get recertified. However, I was not to optimistic on the speed of this happening as I was told by support that they may look at testing and recertifying sometime later this year, but as of yet no testing had been done with vSphere 4.0. I am going to continue to monitor your thread for progress resolving this issue. Hopefully since you already on vSphere 4 and need DSS to be compatible it doesn’t take until the end of the year to get that.
I am also curious what version of DSS you are running?
Were you experiencing any iSCSI timeout errors when using ESX 3.5?
What RAID level are you running on your DSS SAN?
We had fought iSCSI timeout issues with DSS for many months. I tried configuring the DSS SAN with all of the recommended perfomance settings as specified by Open-e support. Things got a little better, but when the SAN was under moderate to heavy load we would still get CMD Abort and Task Not Found errors. What I found to be our ultimate solution was to rebuild our RAID set as RAID 10. Previously I was using RAID 5 and had my VMs split accross 2 DSS SANs, so each was had 50% of the ESX load. Now with the SANs at RAID 10, I am able to run ALL of my VMs on 1 SAN with better performance and NO iSCSI timeouts or errors, under any load.
I'm running the 3513 build of DSS. I have two servers which I run DSS on, each server has 3 RAID configs and I map their LUNs to a single target. I have, on each server, different RAID setups ranging from RAID1, RAID5, RAID 10.
I also run these targets through a 10GbE Network, so maybe this is why I never ran into timeouts, are you using a similar setup?
On ESX3.5u3 I was not getting this same issue... however, I was experiencing finicky connection drops due to the use of the NetXen 10GbE NICs in my systems.
This weekend I will be installing Intel 2xCX4 10GbE NICs in my machines to alleviate that problem.
I am willing to test out the new release. In fact, I have just started testing with 2 - DSS servers, iSCSI failover, and Volume Sync with ESX 3.5. I'm glad I saw this post, as we are planning on upgrading our VMware enviornment to vSphere 4 in the next week. Please let me know if I can participate in the Beta.
Thanks,
Jason
Info: 2x Dell PE 2950 with Perc 6/i and MD1000 split on 2x Perc 6/e
(4TB total with 15k SAS drives)
We are running DSS Build 3511 with ESX 3.5 U4. I have 2 DSS SANs each configured the same with 1 RAID 10 volume. Each SAN has 6 x 1 Gbps NIC ports. 2 are Bonded using Round-Robin for the GUI and 4 are Bonded using Round-Robin for iSCSI. These SANs only have 1 iSCSI target with multiple LUNs, each exposed only to a 3 ESX host Cluster.
I would read this because even though things seem ok and the errors have stopped. I have as run into the issue when we first got our DSS SAN and they had not received VMware Certification for 3.5 either, where I had complete LUNs disappear or just the data within the LUNs disappears. Be very cautious if there is any critical data on these SANs. I know I got pretty freaked out when a 1/4 of my VMs disappeared into thin air.
Yea, I had actually read over that post early yesterday.
I had the same problem where my LUNs would disappear in a sense that when I browsed the Datastore on ESX all data was gone.
I had to reboot the ESX hosts and the DSS about 2 times before anything showed back up.
I later found that logging into the DSS and managing the target, that unmapping and remapping the LUN quickly fixes the problem and all of the VMs show back up and continue operating properly.