Visit Open-E website
Page 1 of 3 123 LastLast
Results 1 to 10 of 25

Thread: Exomium S200 and DSS V6 and LSI 9265i and Cachecade -> Data corruption

  1. #1

    Default Exomium S200 and DSS V6 and LSI 9265i and Cachecade -> Data corruption

    Hi,

    habe here an SuperMicro with LSI based certified OPEN-E Exomium S200 box with a LSI 9265i (with batt. backup and Cachecade 2.0 Pro).
    Cachecade SSDs are 4*Samsung 843T Pro in a RAID1.
    After 14 days of usage, I can confirm that the latency improved, unfortunately in case of data consistency the opposite happened.
    I had a uncorrectable disk directory structure error in a Win7 X86 (any kind of chkdsk was not able to repair it), a correctable in SLES 11 SP2 Ext3, and a uncorrectable at our main Mac OS X 10.8.4. server. After disabling to cachecade the last one disappeared (this saved a lot of time and effort) so I'am quite sure that the problem was just a not consistent Cachecade cache.

    For sure I use the lastet DSS V6 and LSI firmware.

    After that I opened a ticket at LSI, after a few day the suggested to test the previous firmware release if the data corruption occurres too.
    I refused this, because this is our main production data box with 20TB, after that LSI closed the ticket.
    Think about to run something like this and find inconsistent data after 14 days, do you have a backup strategy which can fix that?

    I have currently a discussion with the hardware vendor about the product and have to make sure that LSI Cachecade is supported by OPEN-E and the driver in OPEN-E fullfill the requirements of LSI for Cachecade. OPEN-E firmware is 6.0up98.8101.7337 64bit, 9265i is 23.16.0-0018.

    Can you please confirm this?

    Do yo have any success story with CacheCade and OPEN-E without corrupt data? Or is it usual that the data are corrupt using LSI Cachecade?

    Thanks a lot in advance

    Henri

  2. #2

    Default

    We have been working with the LSI Cachecade for a long time and have not heard of such an incident, we have the drivers inside of the DSS V6 and V7 but the drivers do not corrupt the data, this would be on the hardware side. The best would be to use the Snapshot function of the DSS and you can create tasks so that the Snapshot will be available for 3 weeks, so in the event your hardware has issues you can retrieve the data.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  3. #3
    Join Date
    Aug 2010
    Posts
    404

    Default

    We have many customers who runs LSI with Cashcade, and everything is working normal. It could be the Firmware build with your current controller causing this issue, or there is an issue with one of the disk(s).

    In case if the data get corrupted on the Primary box, if the replication service kept running then the mirror will get corrupted too. Having Snapshots, Backup in such cases should help ( in some cases ).

    Also you may find the following Forum link and KB useful for you:

    http://forum.open-e.com/showthread.p...achecade-2-Pro
    http://kb.open-e.com/How-to-dramatic...ance_1293.html

  4. #4

    Default

    Hi,

    thanks for the replies.

    I found in the meantime 3 more corrupted filesystems (1*HFS+, 2*SLES 11 SP2). Unfortunately I find this corruption only after rebooting the VMs.
    We have our production data on this box, so a snapshot and revert to snapshot for invoicing data, Mail server data etc. is maybe not the best idea.
    I added a 8 SSD drives to the box and moved the performance critical vmdks to this SSD datastore, the performance is now okay.
    Currently I see not way to perform more alpha tests with LSI cachecade, it took too much time and effort to repair the data, the risk to lose data
    is too high. To revert to a full backup after 3 weeks or more is usually not an option, to restore the data file by file and verify what's happened in between is maybe possible, but takes a lot of time.

    Maybe anybody else can work with LSI to bring the software to a beta stage.

    We will see who will complain in the next months.

    Thanks
    Henri

  5. #5

    Default

    Hi,

    just got the message, that the issue was caused by an outdated MegaRAID driver, Revision 6.12 in the latest version of OPEN-V6,
    which does not support CacheCade.

    Can you confirm this?

    Thanks
    Henri

  6. #6

    Default

    I'm sure if there is a new driver we can provide this in a small update (for Premium Support only), but is there release notes from LSI stating this that if you don't have the latest driver this can happen?
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  7. #7

    Default

    Hi To-M,

    me intention was only to benefit by the performance improvement by Cachecade, "just to use".

    The hardware vendor wrote me, the data corruption was caused by a outdated driver in OPEN-E (confirmed by OPEN-E). The driver 6.12 was released by LSI mid 2011, LSI CacheCade in Mar 2012. Therefor CacheCade is not supported at the driver version. LSI told him, if disabling Cachecade helped at data corruption, this is a indication of the kind of problems, both OPEN-E and LSI has confirmed the problem.

    Just want to know if you can confirm this too (from the OPEN-E standpoint of view) and when a LSI CacheCade compatible version will be released by OPEN-E?

    Currently I'am stuck at V6 because I have enabled the NFS HA feature. Is this in the meantime available in V7 (then I can upgrade)?
    Sorry, had more than enough trouble with this stuff.

    Thanks
    Henri

  8. #8

    Default

    to be precise, Cacheade Pro 2.0 released in Mar 2012.

    Henri

  9. #9

    Default

    I don't believe this is true.

    We have 1000's of V6 and V7 in production and we would have updated the LSI Megaraid SAS drivers from the LSI release notes if this was true.

    I have checked our support DB and have not found any records of this associated with any support tickets of this issue. What was the ticket that Open-E stated the same as LSI?

    We don't make the driver and the Cachecade is an addon feature to certain LSI controllers. Normally the firmware would control the Cachecade along with some parts of the driver to perform the algorithms it uses for hot data.

    I would also like to know the case w/ LSI where they stated this, you can submit a support ticket to Open-E with the contact information.

    We have several webinars about the LSI Cachecade and I did one with the LSI engineer in March 28th of 2012 and nothing was mentioned about a out dated driver that can cause corruption.

    Again we don't make the drivers, the vendors do so DSS would not be involved here. If there was corruption then it could be where the SSD for the cache was a RAID 0 and not a RAID 1 thus if losing the SSD in a RAID 0 then yes you can have corruption for the Writes at that time the SSD went bad but not for the Reads.
    Last edited by To-M; 08-14-2013 at 10:34 PM.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  10. #10

    Default

    Hi Todd,

    I wrote a main to Janusz Bak, in copy to support@open-e.com, with the statement of my hardware vendor.

    It's in german (can read it too?), maybe Janusz can clarify this.

    Thanks again
    Henri

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •