Hi folks,

another weekend full of trouble:

During server maintance work I experienced severe difficulties accessing our FC targets... from first glance it looked like our FC initiators were going crazy, but in the end I believe there is a severe problem on the DSS side:

  • one DSS server (5.0.DB49000000.3278), 32 bit mode, 2 quad-core Xeons, one QLE2462 (dual-port, 4Gbps) FC adapter in target mode. 13 +/-1 FC groups with a single volume (LUN 0), two have two volumes (LUN 0 and 1). Access to the groups is restricted to a single WWPN per each group.
  • FC switch with single connection to the DSS server
  • two identical Xen servers (SLES10SP2, current patch level, dual Intel Xeon quad-core, 22GB memory, one QLE2462 with single connection to FC switch)
  • various Xen VMs accessing FC groups on the DSS via NPIV (virtual FC adapters). Each VM uses a unique WWPN, giving it access to one of the FC groups of the DSS from any of the Xen servers. The VMs are set up to "live migrate" across the Xen servers.


Symptoms:
Even after rebooting (full power-off, then power-on), on both Xen servers access to FC targets gave access to the wrong volumes. That way, a VM that was started turned out to be a different VM (different software configuration).

Before the power-cycle, many of the VMs were active and then migrated across the Xen servers, this lead to multiple Xen VMs accessing the same disk CONCURRENTLY . It looked like existing connections remained the way they were (correct mapping), newly established connections to the FC target gave the new, mangled mapping.

The mangling went as far as having former two-disk groups containing only a single disk - looks like the security list got corrupted.

After a reboot of the DSS, things went back to normal - at least concerning the mapping. Many VMs where hosed, some beyond repair .

I took a log file snapshot after the DSS reboot, but that seems to cover FC activities only after the reboot.

Anyone else that had such experiences? Does the new DSS level have a newer level of FC target code, is it recommended to upgrade asap?

Regards,
Jens