Visit Open-E website
Results 1 to 8 of 8

Thread: Millions of small files?

  1. #1

    Question Millions of small files?

    We are going to do a demo for a customer in the morning about using the DSS mainly as a NAS system, with manual failover (when will NAS autofailover be available, even in beta? In a month? a year? Maybe for some protocols but not others?). The customer has backup software that stores data deduplicated (i.e. it chops a file up into small chunks, like 16kB, and checksums each chunk, storing the checksums in a database and each chunk as a file on a fileserver/NAS somewhere) into millions of little files. These files might even be all in the same directory. How well does the DSS handle this situation? Is there anything I should keep in mind while I'm talking to the customer?

    I assume that DSS uses XFS for the NAS volumes. Is there any possible way to tune the blocksize/something-else to optimize for small files? There are different hardware RAID configurations that should be used, obviously, but what else?

    Also, in another question, is there some list of the advantages/disadvantages of using the 64-bit kernel vs 32-bit? We might want to use more than 4GB of memory, and I think that the 32-bit default kernel doesn't really take advantage of more than 4GB of memory. Anything else?

  2. #2

    Default

    Also, how well does DSS do on solid-state media? Is it possible to use it without a swap space or to put the swapspace on a seperate, spinning disk?

  3. #3

    Default

    It looks like XFS uses B+ trees, so it should do well for millions of files in a single directory. Anything else I should know?

  4. #4

    Default

    Yes, we use the XFS file system and there is almost no limitation on how many files can be on a LV, so XFS limit is the free space.

    Just think about how much memory you will need as if you have millions of files you may have some calculations but with 64bit mode and 8-12GB of memory should be a good start especially if you have to run a file system repair.

    I would break up the files into directories, if all in one directory with millions of small files in one folder could take a long time to list even w/ Btree.
    That is if you are using a NAS with SMB and or NFS protocol now with iSCSI protocol being block-based this will work better.

    Of course we do have Handling large directories feature for the Shares.

    This option allows you to significantly speed up file listing. The prerequisite is to convert all file and directory names to lower or upper case exclusively. Please select your preferred choice below.
    Note:
    You will need to convert your existing file and directory names to lower or upper case before selecting this option, as otherwise they will become inaccessible.
    Note:
    Please note that due to case sensitivity issues the operations above may have negative impact on Unix-like systems. Please prepare accordingly beforehand. Windows is not affected.

    Location:
    CONFIGURATION -> NAS resources -> Shares -> [share_name] -> Function: SMB settings.

    Also we have "read ahead" and "NFS deamons count" in "Tuning options" in "Hardware Options" from the console (ALT+CTRL+W). You will have to test with some of these options with this environment.

    Read ahead disk tuning - with this option you can increase for better
    performance size of cache. In some cases it is required to decrease it
    for better compatibility. Maybe with 1024 or higher.

    NFS Daemon tuning - you can set how many NFS daemons you want to have
    run in the system. On some system NFS causes NFS timeouts and changing
    this value then helps. It also can improve NFS performance. Try with 64 - 128.

    Concerning the swap don’t worry about this, we only really use so much of the 2GB and will be changed in future releases. In the end add more memory and as cheap as it is today it is worth
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  5. #5

    Default

    Thanks, Todd! We did a demo with our customer, and it looks like they will buy. I found out that the backup deduplication product (aka thin backups?) they plan to use has a folder hierarchy of some sort, so they won't have millions of files in a single folder. It's really a good idea that you guys are using XFS, I think (although ZFS--with deduplication hooks--would rock).

    They are a Windows shop, so they will use SMB. Does the NFS daemon count affect SMB?

  6. #6

    Default

    No the SMB will not be associated w/ the NFS daemon count.

    Well as I know most of the resellers and sys. builders don't really like De-Dup - doesn't help sell more of you know what.

    Great that the demo went well.

    See ya next time!
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  7. #7

    Default

    True

    Well, primary storage dedup might require very low latency access, ala solid-state drives, in order to piece together all the chunks of the files on the fly. So, although it uses less storage, the storage it does use would be much more expensive. How well does DSS work with solid-state drives?

  8. #8

    Default

    In that case i would agree. Never tested the SSD 's but shur they will work very well.

    I just saw that the price went down today from SANdisk.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •