Well, our distributor had already patched in the newer (fixed) firmware, so we didn't have that problem at all. I had been prepared to flash it myself, but it was already done. Really, we haven't suffered any drive failures from the firmware problem. Stuff like this happens every once in a while to all the manufacturers, and it isn't as severe as, say, problems that we've had with the 400GB Hitachi "Deathstar" drives. We still like Seagate a lot.
Alright, so I've been doing some testing on a smaller system using FC. Unfortunately, the initiator side still has only 1Gb FC with 32bit/33Mhz PCI-bus, but at least that gives me a way to compare performance to the DSS FC system we've already installed on our customer's site.
Basically, random read I/O/s are about 14,000 from the cache. The DSS target has only 1GB of system memory, so I used a smaller test file of only 200MB. This DSS target is using a software RAID 5 with 3 drives, so there's not much performance coming from the drives themselves
Target side:
DSS b3278 32-bit
Dual-core pentium 4 (Netburst) 3GHz (no Hyperthreading). 1GB of system RAM. 3x500GB 7200 rpm Barracuda SATA drives (I think they have 32MB of cache, otherwise it's 16MB each). Dual-port 4Gb qlogic fibrechannel card with Pci-E x4. 10GB FC volume with 4KB blocksize, default options (i.e. write-back enabled, not write-through).
Initiator side:
Windows 2003 Server.
Pentium 4 2GHz (Netburst crap). 1GB system RAM. some really old single-port 1Gb Qlogic Fibrechannel card plugged into a 32bit/33Mhz PCI slot.
or something like that. I have to run it for a lot longer beforehand in order to make sure that the whole test file is in the DSS's system cache.
Anyways, I consistently get about 13980 or so, which is within 5% of the much higher spec'd system I used before. The difference there was that it had 4 times as many cores (and they were using the Core-flavored Xeons which probably had a much higher FSB and DDR2 vs DDR).
So, it seems that ~14000 random read I/Os is to be expected when using 1Gb FC with 4kB blocks and a test file that fits entirely in the system cache. When I use a larger test file which doesn't fit in the cache, the random read I/Os on this system drop to around 150 to 300.
So, if I made a system with, say, 128GB of system memory in 1U, then I could get a usable high-performance SAN without having to fill a rack up with hundreds of 15K SAS drives and then short-stroking them. The only bad part so far is that I'd have to wait until the cache was filled until I could expect high performance.
PS, we have another system coming in that's identical to the DSS target system (i.e. dual-core pentium4, etc.), so I should be able to test how well this works with volume replication.
Stay tuned for write performance and performance using iSCSI!
I just ran sqlio while using an iscsi volume (only using a single 1Gb ethernet connection, jumbo frames not enabled). It was pretty slow for random read I/O in cache. In fact, it was about only 470 I/O when I ran it this way (with a 200MB test file):
sqlio -kR -s60 -frandom -o128 -b4 -LS -Fparam.txt
The param.txt file has only this in it: "H:\testfile.dat 1 0x0 200", which is where the test file is located (i.e. we are testing the D volume right now, which happens to be iSCSI) and how many threads to run (one) and the bitmask that sets processor affinity (does not apply, since there's only one core) and how big of a test file in megabytes. When I did both the FC volume and the iSCSI volume, I used NTFS with 4096Byte blocksize. The initiator is MS S/W iSCSI 2.08. The iSCSI target is block I/O mounted with writeback cache enabled. The FC volume uses 4096 byte blocksize.
Whereas when I use a FC volume over a 1Gb FC interface, it's about 14000 random read I/O.
I've done this a few times now, and it seems quite obvious that FC is far superior for random I/O reads from cache. Granted, I am only able to test one thread at a time right now, since my initiator side is only single-core. Oh well.
I tried running the same thing on the FC volume right after this test (but first running a sequential sqlio run to put the benchmark file in the DSS system cache, as in the iSCS volume test). I get over 14,000 random read I/Os!
BTW, the easiest way to make sure the whole benchmark test file is in the cache while using sqlio is to just run sqlio in sequential mode beforehand once or twice for a long enough for it to read the entire test file. If this were not a benchmark but a production environment with a cache large enough for, say, an entire mysql database to fit in, one could just "dd if=/dev/sda1 of=/dev/null bs=4k" to make sure everything is in cache before you start using it. This could be done on the DSS side or the client side. The DSS side would be a little faster, but it doesn't really matter that much, since you'd only rarely have to do it.
PS, I'll try to see what difference jumbo frames makes, if I can enable it.
These are the results with an iSCSI volume (1Gb eth with no jumbo frames):
The random write iops never got higher than 117. This is without jumbo frames. My on-board ethernet card doesn't support jumbo frames, but I have a card that does, so I'll use that next. Even so, this is rediculously low performance. There's probably something I did wrong. I'll try to find it. 117 IOPS sounds like it isn't using any of the DSS's system RAM for caching and is just going straight to the disks.
THIS IS OUTSTANDING WORK!!! Just to let you know that the engineers are all looking into your work
They wanted to know if you can test with the Atlanta version.
I think I will propose to post your results on our website and see if we can create a section for best performance tests with our customers and partners using FC, iSCSI and NAS.
Check this posting on speeds reported with Open-E DSS Benchmark with PIC!
Would be good to check with him. I will let him know of your resutls as well.
Server
DELL 860 with 6/i
RAID0 WD GP 750G X2
Open - DSS Version: 5.0up60.7101.3511 32bit
FC HBA:Qlogic QLA2344 2Gbps PCI-X in target mode