Visit Open-E website
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Another couple orders of magnitude performance: RDMA with Infiniband or 10Gbe

  1. #1

    Question Another couple orders of magnitude performance: RDMA with Infiniband or 10Gbe

    Okay. So, is there a plan to implement SCSI over RDMA? This, coupled with solid-state disks (either SATA/SAS-based with RAM or with the new Intel SSDs or even pci-e based) or even just a server loaded with (soon, with MetaRAM) up to RAM might bring the IOPS up above 100,000. That would be completely insane. Right now, with a couple of Gbe ports teamed via MPIO (I know, not the best) with 4GB of system RAM, 12-disk RAID 5 array (with 2GB in the RAID controller), we get about 2400 IOPS with SQL io benchmark.

    I've read that Infiniband (with RDMA) can, in some cases, provide 3 x fibrechannel IOPS. Has anyone used Infiniband over IP? What sort of IOPS are you getting?

    Is Open-E looking at doing any sort of RDMA/iSER sort of work? (Or even FCoE?)

    I got to thinking about this because if we're going to SSDs in the next couple years, it'd be nice to have a protocol that can take advantage of the low latencies possible. And x4 Infiniband PCIe cards are about the same price (and bandwidth) as 10Gbe cards are (you can find them pretty easily for under $1000, way less than that refurbished, obviously).

    Well, whether you use Infiniband or 10Gbe, RDMA (vs. TCP/IP) should grab you a bunch more IOPS just by getting rid of a lot of protocol that's unnecessary on a little SAN (esp. for point-to-point, no switch) where you aren't likely to drop any data.

    Also, what are the highest IOPS that you (i.e. Open-E and anyone of you customers out there) are getting, using FibreChannel or iSCSI?

  2. #2

    Default

    Good point Robotbeat! We have plans to implement RDMA in the future, just no ETA as to when.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  3. #3

    Lightbulb

    Also, one feature that'd be AWESOME would be support for making a RAM disk out of a chunk of system memory.

    I mean, Sun and HP both have (eight-way quad opteron) boxes with up to 64 DDR2 slots. MetaRAM has 8GB DDR2 RAM sticks, and they're coming out with 16GB ones (that they claim is compatible with such opteron chipsets), so that is potentially 1TB of system RAM!!!

    This should enable ridiculous (ludicrous?) SAN/NAS performance. Some customers would like to be able to carve out even a dozen GB of RAM to use as scratch-disk or on some really crazy database application.

    This RAM disk would help us to tweak system performance maximally and to help us prepare for SSDs.

    And, it'd enable us to "legitimately" claim equally ridiculous performance specs in our marketing .

    Eventually, it'd be cool to have the RAM disks replicated via iSCSI failover or something like that to make it partially sane to use even in production environments.

    So... Ram disk support?

  4. #4

    Default

    I just wanted to add my voice to all who are asking for RDMA with Infiniband support.
    It would be greatly appreciated if the update(s) on ETA of this could be posted as they become available.

    Generally, is there roadmap/planned features list available?

    Thanks

  5. #5

    Default

    We are working on this for our partners but publicly we might not provide this for competitive reason as even some of our competitors don't do this. We have been thinking of providing this information but we need more time to think about the implications that are associated with this type of announcement.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  6. #6

    Lightbulb

    Have been thinking about this (and experimenting on debian linux and our little network), and the very perfect purpose for this sort of thing would be for storing the swap partition (or file) for systems that booted from the SAN already. I mean, if the main server goes down, you're going to shutdown your clients that booted to that server anyways, and you don't have to worry about keeping the data in the swap file, anyway, since it's just working as an extension of the system's RAM. Also, you could relatively easily enable autofailover for this situation, since the ramdisk is just a block device. Granted, you'd have to recreate the ramdisk every time you rebooted, but you could just make that part of the startup procedure. Plus, in linux, it's easy to make multiple swap partitions!

    Here's an interesting paper on using an RDMA-connected (infiniband) network block device to store the swap partition on a ramdisk on a remote system:

    http://www.cse.ohio-state.edu/~liang...-cluster05.pdf

    Apparently, their performance was so good that certain tasks that they tested (like quicksort) were only 40% longer execution time than using local memory, while using a swap file on a disk is 20 times longer. Here's another presentation they did:

    http://nowlab.cse.ohio-state.edu/pub...-cluster05.pdf

    I might try setting this up myself in debian (may possibly try to get this working with drbd failover, just to see if it works...). I'll let you guys at open-e know what it's like.

    Also, having to use the tcp/ip software stack vs. not (i.e. rdma) means that you have to use 3x the memory bandwidth (I think), which is fine at 100 MB/s, but you start running into problems at 1-2GB/s.

  7. #7

    Lightbulb

    Have you guys heard of "Managed Flash Technology"? It's a software (originally developed for linux) that sits between a hardware flash device (or array of ssds) and any filesystem. Basically, it makes random-writes into sequential writes. It costs money, so Open-E would have to license it, but it could allow open-e to compete with the big boys when it comes to random write IOPS using just regular SSDs.

    Open-E DSS ssd-edition.

    http://managedflash.com/home/index.htm

  8. #8

    Default

    Thanks!! We are looking into some of these but we found out there is allot of work in development and this is why you pay for those NetApp cost but we are working on it . Then we will list it on our HCL but give us some time it will come!
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

  9. #9
    Join Date
    Aug 2008
    Posts
    236

    Default

    The managed flash software seems very interesting. I think that SSDs are the next evolution for storage. From a physics perspective, I'm not sure how much more can be done to improve mechanical disks. But right now, it's hardly affordable and because of the life time of the more affordable MLC variety, I have questions about reliability and MTBF.

    I think that RDMA would be awesome, but RDMA is going to be for IB and right now, there are some innovative things that can be done to improve iSCSI performance over native IP/Ethernet.

    http://www.ele.uri.edu/Research/hpcl/STICS/iCache.pdf

    If you have access to this white paper, it talks about a different caching strategy to improve performance of iSCSI by 58 to 70ish percent. I'd like to see some innovation along these lines...

  10. #10

    Default

    Thanks for this info - I passed it on to our development and management team as we are looking for ideas.
    All the best,

    Todd Maxwell


    Follow the red "E"
    Facebook | Twitter | YouTube

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •