Visit Open-E website
Results 1 to 3 of 3

Thread: Fiber target messed up

  1. #1
    Join Date
    May 2008
    Location
    Hamburg, Germany
    Posts
    108

    Exclamation Fiber target messed up

    Hi folks,

    another weekend full of trouble:

    During server maintance work I experienced severe difficulties accessing our FC targets... from first glance it looked like our FC initiators were going crazy, but in the end I believe there is a severe problem on the DSS side:

    • one DSS server (5.0.DB49000000.3278), 32 bit mode, 2 quad-core Xeons, one QLE2462 (dual-port, 4Gbps) FC adapter in target mode. 13 +/-1 FC groups with a single volume (LUN 0), two have two volumes (LUN 0 and 1). Access to the groups is restricted to a single WWPN per each group.
    • FC switch with single connection to the DSS server
    • two identical Xen servers (SLES10SP2, current patch level, dual Intel Xeon quad-core, 22GB memory, one QLE2462 with single connection to FC switch)
    • various Xen VMs accessing FC groups on the DSS via NPIV (virtual FC adapters). Each VM uses a unique WWPN, giving it access to one of the FC groups of the DSS from any of the Xen servers. The VMs are set up to "live migrate" across the Xen servers.


    Symptoms:
    Even after rebooting (full power-off, then power-on), on both Xen servers access to FC targets gave access to the wrong volumes. That way, a VM that was started turned out to be a different VM (different software configuration).

    Before the power-cycle, many of the VMs were active and then migrated across the Xen servers, this lead to multiple Xen VMs accessing the same disk CONCURRENTLY . It looked like existing connections remained the way they were (correct mapping), newly established connections to the FC target gave the new, mangled mapping.

    The mangling went as far as having former two-disk groups containing only a single disk - looks like the security list got corrupted.

    After a reboot of the DSS, things went back to normal - at least concerning the mapping. Many VMs where hosed, some beyond repair .

    I took a log file snapshot after the DSS reboot, but that seems to cover FC activities only after the reboot.

    Anyone else that had such experiences? Does the new DSS level have a newer level of FC target code, is it recommended to upgrade asap?

    Regards,
    Jens

  2. #2

    Default

    Hi,

    i have an similiar problem a year ago.
    but with an Promise iscsi m610i.
    after an power failure the time to connect to the LUN went up to 3-5 Minutes.
    After that we noticed that the mapping was wrong on the initiator. on the promise it looks good.
    all the data were corrupted.
    we changed the management unit, with no effect. after changing the raidcontroller anythink was like as before. all the data were ok and so on...

    so if you can change the raidcontroller, give it a chance.
    what a raid controller you use for your DSS?

    greetings
    roger

  3. #3
    Join Date
    May 2008
    Location
    Hamburg, Germany
    Posts
    108

    Default

    Quote Originally Posted by rogerk
    Hi,

    i have an similiar problem a year ago.
    but with an Promise iscsi m610i.
    after an power failure the time to connect to the LUN went up to 3-5 Minutes.
    After that we noticed that the mapping was wrong on the initiator. on the promise it looks good.
    all the data were corrupted.
    we changed the management unit, with no effect. after changing the raidcontroller anythink was like as before. all the data were ok and so on...

    so if you can change the raidcontroller, give it a chance.
    what a raid controller you use for your DSS?

    greetings
    roger
    Roger,

    thank you for your feedback. We're using an Areca controller (16 port, I don't have the exact model number handy right now, I'm off-site until the end of the week).

    I tend to believe we have a different situation here, but maybe that's just like whistling in the dark... The LUNs in themselves weren't messed up, even those FC groups where we had configured two LUNs where right - it was only that the wrong FC groups where "connected" to the FC initiators. That things had gone worse than expected was visible when two different Xen VMs (with differen WWPNs) were booting the same virtual disks - and that was the reason data got corrupted. (To be more precise: one VM was running, the second one started new on the Xen server - and came up with the same disks.)

    After a simple DSS reboot everything went back to "normal" (in terms of group/initiator mapping).

    With regards,
    Jens

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •