Continuing my series of in depth looks at flash appliances, sans, and drives I spent a few weeks test driving the Violin Memory flash ( and DDR based ) solutions. Just from the specs the Violin Memory 1010 is impressive. According to the site the v1010 does 300K random reads per second and 200K random writes and has latency of less then 300 microseconds! That is pretty impressive! But as I have stated before its difficult to test these limits with our current set of benchmarks. For my test’s I did run this through the ysbench fileio tests and dbt2 to get a feel for performance, but I was really eager to test the new juice db benchmark to really drive IO. For the test Violin generously made available a 4 core (3.4Ghz ) server with 8GB of memory with access to a 360GB DDR based v1010 and then a 320GB DDR based v1010. Unlike the Ramsan I tested a few weeks ago the V1010 does not have ddr based cache sitting ontop of its flash (they may however add this later), so these numbers should be raw flash.
A quick note: While I did test the DDR based solution, I spent a lot more time and energy on testing the flash solution. Why? Because the DDR based solution does not persist data ( I.e. if you lose power to the unit all the data will be wiped, it is truly a memory based solution… ) While this has its place, it makes it so you need to have plans and steps in place to minimize power issues ( ups and generator’s ) and repopulate your database in case of an outage ( I can not tell you how many times over the years I have seen someone trip on , or pull the wrong cord in the datacenter ). Plus I am all about the flash:) Keep in mind the Flash device I was working with was a beta unit, they are still looking at adding features and enhancements to further boost performance.
Enough words! No body really cares about my fluffy babel, they want to see pretty eye candy! They want to see how it performs. So onto the benchmarks!
Lets start with the generic sysbench fileio benchmarks. I hit up 16 & 32 thread tests (16K io) on both the ddr and the flash based systems. The DDR based seems to get a slight boost from the extra threads, while the flash system stayed almost the same (I skipped the 32 Thread flash chart because their was not difference ). You can see here that the DDR based device is pushing 60K iops regularly:
See the slight boost from 32 threads:
Still these numbers are pretty impressive. It is totally unfair, but to give some perspective to this a single intel mlc drives hits 1800 iops per second in the 50-83% tests, so the DDR based system is roughly 35x faster in this test… and it should be!
But I am all about the flash, and I think people want to know whats this look like on the flash based solution. So lets look, how does it perform?
Still fast but at roughly half the number of IOPS compared to the DDR solution. We do peak into the 40K iop range with a smaller datafile size, but when we have over 100GB of data we end up in the lower-mid 30K iops range. The spike during the 20GB test must have had something to do with filesystem cache as this device did not have a ddr based cache like the TMS unit I tested recently. The thing to look for here is the performance when we reach close to filling the drive. Some flash devices tank when you hit close to 100% capacity. No worries with the V1010, the tests at a 275GB dataset were very consistent with the tests with a 120GB dataset.
My final sysbench check is to try and get a sense for performance with different thread concurrencies. So I tried 8/16/32 threads to see what the V1010 could deliver. The flash side this yielded something a little different then what I expected:
8 threads with a 260GB file size was delivering 5K-6K iops per second, while 16/32 delivered performance in the low 30′s. This in itself is not an issue, what it could mean is the system was designed more for massive parallelism and less for 1 single operation, or this could be an anomaly. I only got 1 good 8 thread run in on the flash device, so a rerun may have invalidated this. Also maybe a firmware/beta issue. None the less I was consistently getting 30K+ iops for every other test I ran with 16+ threads. So hopefully I will continue to see the same level of performance at a higher number of threads with my other tests.
Friday I will post my DBT2 #’s, followed up by the Juice Benchmarks on Sunday, and I may have a fourth post on this next week as well.