Alright, so I've been doing some testing on a smaller system using FC. Unfortunately, the initiator side still has only 1Gb FC with 32bit/33Mhz PCI-bus, but at least that gives me a way to compare performance to the DSS FC system we've already installed on our customer's site.

Basically, random read I/O/s are about 14,000 from the cache. The DSS target has only 1GB of system memory, so I used a smaller test file of only 200MB. This DSS target is using a software RAID 5 with 3 drives, so there's not much performance coming from the drives themselves

Target side:
DSS b3278 32-bit
Dual-core pentium 4 (Netburst) 3GHz (no Hyperthreading). 1GB of system RAM. 3x500GB 7200 rpm Barracuda SATA drives (I think they have 32MB of cache, otherwise it's 16MB each). Dual-port 4Gb qlogic fibrechannel card with Pci-E x4. 10GB FC volume with 4KB blocksize, default options (i.e. write-back enabled, not write-through).

Initiator side:
Windows 2003 Server.
Pentium 4 2GHz (Netburst crap). 1GB system RAM. some really old single-port 1Gb Qlogic Fibrechannel card plugged into a 32bit/33Mhz PCI slot.

sqlio -kR -s60 -frandom -o16 -b4 -LS d:\testfile.dat 1 0x0 200

or something like that. I have to run it for a lot longer beforehand in order to make sure that the whole test file is in the DSS's system cache.

Anyways, I consistently get about 13980 or so, which is within 5% of the much higher spec'd system I used before. The difference there was that it had 4 times as many cores (and they were using the Core-flavored Xeons which probably had a much higher FSB and DDR2 vs DDR).

So, it seems that ~14000 random read I/Os is to be expected when using 1Gb FC with 4kB blocks and a test file that fits entirely in the system cache. When I use a larger test file which doesn't fit in the cache, the random read I/Os on this system drop to around 150 to 300.

So, if I made a system with, say, 128GB of system memory in 1U, then I could get a usable high-performance SAN without having to fill a rack up with hundreds of 15K SAS drives and then short-stroking them. The only bad part so far is that I'd have to wait until the cache was filled until I could expect high performance.

PS, we have another system coming in that's identical to the DSS target system (i.e. dual-core pentium4, etc.), so I should be able to test how well this works with volume replication.

Stay tuned for write performance and performance using iSCSI!