Well, unfortunately I didn't run any more tests. I wish I had had a whole day to run tests, but I didn't.

I'm thinking, though, that if I lowered the block size to 512B and used a full 4Gb between two machines, over 100,000 random I/Os is not out of the question if you can fit everything in the cache. We have some other options for motherboards in our standard box, too, which have 16 slots, which would let us cheaply add 64GB of ECC RAM. There are also relatively inexpensive boxes out there (1U's, even), which have 32 slots, allowing a cheap 128GB of (ECC) RAM. We could market these as disk-backed memory appliances or database-acceleration boxes if there was a way to keep everything in cache (well, once the database starts up, everything important would get put in cache, so that's not really important). If we had two of them with autofailover and cache-coherent DRBD, you have effectively a highly-available memory appliance for much less expensive than less-reliable (because no autofailover) proprietary memory appliances.

Except for the fibre-channel autofailover, we could basically do this right now. Granted, it'd be much lower latency if DRBD worked with Infiniband (RDMA, not the IPoIB). But, for customers who don't care too much about high-availability or keeping track of every single write operation, this would work fine.