PDA

View Full Version : ESX 3.02 strangeness with iSCSI-R3 2.30


CraigB
08-31-2007, 04:22 AM
Just upgraded to 2.30.DB00000000.2820 and see a few "wierd" things with ESX 3.02...

Configuration:

1 VG (vg00), 4 logical volumes (lv0000 (pre 2.30 upgrade), lv0001-lv0003 (after upgrade)
3 targets (vm01-vm03)
lv0000 = LUN 0 in vm01 (250GB)
lv0001 = LUN 1 in vm01 (500GB)
lv0002 = LUN 3 in vm02 (450GB)
lv0003 = LUN 0 in vm03 (50GB)


Problem #1
-------------

Looking at this setup from VMware (Configuration, Storage Adapters, Details)

Target 0 - vm01 - 2 LUNS, 250GB (LUN 0) and 500 GB (LUN 1) - As expected
Target 1 - vm02 - 2 LUNS, 250GB (LUN 0) and 450 GB (LUN 3) - Didn't expect a "LUN 0"
Target 2 - vm03 - 1 LUN, 250GB (LUN 0) - Didn't expect it to be 250GB

So there are two issues

1. "LUN 0" is every target and is being set to the 250GB LV
2. The first LUN # -> LV mapping "wins" if the LUN # is used in another target (i.e. LUN 0 is used in vm01 and set to lv0000, and when LUN 0 is used in vm03 and set to lv0003, it is "overridden" and LV0000 appears instead)

Looking a bit deeper it seems that "Path" and "Canonical Path" are different in the ESX storage adapter details page


____________Path________Canonical Path
Target 0 - vmhba40:0:0.____vmhba40:0:0
_________vmhba40:0:1____vmhba40:0:1

Target 1 - vmhba40:1:0____vmhba40:0:0________Huh? No LUN 0 in this target
________vmhba40:1:3____vmhba40:1:3

Target 2 - vmhba40:2:0____vmhba40:0:0________Huh? Using target 0's LUN 0 instead of 3's

So the basic result is under ESX 3.02 the LUN numbers must be globally unique, which is not true under the Windows Vista iSCSI initiator. Not a big deal to keep the LUNs globally unique, just unexpected...


Problem #2
-------------

A "new" lv will cause an error when attempting to add storage. On the "Current Disk Layout" step the error "Unable to read partition information from this disk" will be returned. You must then create a partition on this disk using Windows or fdisk from the ESX console session. Using

fdisk /vmfs/devices/disks/vmhba40\:0\:1\:0

to create the partition will show a warning saying

"Invalid flag 0x0000 of partition table 4 will be corrected by w(rite)"


This seems to be a new "extra step required" since upgrading from 2.21 to 2.30; not sure if 2.21 actually put some kind of partition in there, or just didn't write the "invalid" 0 in there...


Anyway, just my observations so far...

Craig

To-M
08-31-2007, 04:35 AM
Thanks for the update on this and wish we had info or test from ESX 3.02. The only thing I could suggest is before updating did you wipe out the existing Logical Volumes then recreate them with the new update - if not could you test with this procedure? Also send me logs at todd.maxwell@opne-e.com to review them.

paul2kl
09-05-2007, 01:00 PM
Hi,

I have the same issues using DSS server (CraigB's Problem 2).

Are we going to see any other issues using BlockIO with vmware ??. I do not want to upgrade the production kit and move all the VM's to this yet.

Have you found any data corruption issues or is it just the need to fdisk the volume prior to using it ??, i would hate to spend days getting the vm's migrated only to get a problem that requires me to restore the VM's from backup and have customers offline.

Still have major issues using the FC cards with vmware 3.02 , (already reported this issue) but was told try DSS 1.30, well same issues are still there so Fibre still not a viable option with open E and vmware.:(

To-M
09-05-2007, 09:29 PM
We are aware of this issue and are moving as fast as possible to correct it. Please be patient. Also can you test when creating a target to set the LUN's to 1 or 2 or 3.... but not to LUN 0. Concerning the FC we are working on this as well. On next release we are racing to have both of these issues resolved, it may be earlier if we complete the tests fast enough with confirmation with some of our dedicated customers.

CraigB
09-07-2007, 07:53 AM
There is a thread going on the VMware forums where someone else is running into having to keep the LUN numbers globally unique with Open-E. There are a few responses there pointing back to the iSCSI Enterprise Target mailing list

http://www.vmware.com/community/thread.jspa?threadID=100438

and

http://www.nabble.com/ietd---VMware-ESX-best-practices--tf4037119.html#a11471503

and

http://www.vmware.com/community/message.jspa?messageID=496519

which points towards a possible config in iSCSI Enterprise Target (which I believe is the base open source package for Open-E) where the SCSI_SN needs to be unique per target/volume or else the LUN# itself needs to be kept globally unique. So a question would be is in Open-E iSCSI are all the SCSI_SNs set to the same value (or not set at all) or are they unique per target/volume (i.e. the "equivalent" of a SCSI disk once presented to the OS)?

Craig

To-M
09-13-2007, 12:59 AM
They are unique per target and volume you can view this in the test.log, example below shows a LV UUID and the lun x signiture.

LV UUID dssYVw-rjOQ-cyt2-iKuN-oXu1-tetB-Epv9ha


cat /proc/net/iet/volume
*-----------------------------------------------------------------------------*

tid:2 name:iqn.2007-07:dss01.vm
tid:1 name:iqn.2007-07:dss01.boot
lun:0 state:0 iotype:fileio iomode:wb path:/RAMDISK/iSCSI_targets/dssm9F-JQn2-SVp0-FUPI-0q6g-2plO-1ZgmZF/lun
lun:1 state:0 iotype:fileio iomode:wb path:/RAMDISK/iSCSI_targets/dsssTt-yIJU-0jvE-i4JV-EzTn-cetb-nWFFT1/lun

roxer
02-29-2008, 04:29 PM
I fixed this by naming the LUNs the same as the vg/lv numbers:

VG01 = LV0001 LUN 10
LV0002 LUN 20
LV0003 LUN 30 - etc.

Targets have the same id as the LUN series.

mhilbrand@vrz.net
03-06-2008, 03:11 PM
Hello Craig

I had the same problem and tried the work-around with the LUN ID, but with no success. Then I mapped the target with the newly created columes to a windows machine (instead of ESX) and opened the disk management for writing a disk signature.

After the disk signature was written, ESX was able to map the drives. Don't ask my why ESX cannot read new volumes, but it works... :cool:

best regards

Maxx