Continuing my series on the Violin Memory 1010 I am turning my attention to the DBT2 benchmark which simulates an OLTP workload. I started with my typical “waffle” workload which is a 20 warehouse setup ( about 2.5 GB ) with a 768M buffer pool and I compared it to a 5G buffer pool with the same setup. The ultimate goal or the nirvana state of any system is to have the performance of the storage system be as fast as having everything all in memory. The closer we can get the better off we are. The sad thing is even with the fastest of flash solutions we see times in the 70-300 microsecond response time range, which is very far off the nano second response time delivered by memory. That being said lets see how close we can get to a fully cached database:
I am including the Intel #’s for perspective here and to show just how close we can get full in memory speeds. The fact is I am comparing a potentially $50,000 piece of hardware to a $400 one… it should be faster. The key take away is this:
With the V1010 I end up hitting about 80% of the fully cached solution. That is impressive. The single Intel drive test ended up best case around 48% of the fully cached solution (I set the IO scheduler to noop, while it was around 30% (~6600 TPM ) when I set the IO Scheduler to cfq… ) . Unfortunately I did not run this same test with the DDR based solution.
Interesting note here, the CPU was not maxed out during the tests:
And for the other:
These CPU #’s are a bit suspicious… 35K TPM is close to what I would suspect from a fully loaded system, not one that has 20% idle cpu. This maybe due to the fact the V1010 does not present or collect all the standard block device stats back to the OS, or this maybe caused by delays in the network traffic between mysql & the dbt2 client.
The IO during these tests was busy, but they were not overloaded:
See we are only doing about ½ the TPS that sysbench was able to do… so we have a long way to go. And from the transfer side of things:
You see we are not even pushing 70MB/s. This is why I suggested a new benchmark like the Juice DB Benchmark… something to really stress the disk.
Interesting note on the V1010, the device does not use the same characteristics as other solutions. You can not set things like the IO scheduler, they do not collect wait and service times, they do not appear to merge IO as often from the OS, etc. Kind of strange to see a 0 Wait time.
Now here is where I got a little fancy with my tests. I decided to see if I could push my IO more.
You see, I wanted more IO! More disk hammering! So I decided to try a couple more DBT2 tests. I figure the Google/Percona Patches are filled with IO goodness so why not try to push more through the system with these… I figured I would try a 100W (~11GB) test with only a 768M Buffer Pool, that should really make the disk sit up and take notice!
So I compared the standard innodb code with a google io patched system with 8 read/write threads and a 500 io_capacity, So onto the results:
Wow almost no difference between the standard mysqld and the patched version when running dbt2. I need to put this through more tests locally to see what the impact of different read/write threads is with flash. But in this case I think the V1010 is fast enough to keep up the standard inno packages happy. I will rerun some of these tests with Juice in my next article. I wonder could this have to do with the relatively small buffer pool being used as well? Smaller BP, less to flush? Minimizing the impact of these enhancements? Needless to say more investigation is required.
Lets take another peak at the TPS (iostat) from from these tests:
~26K tps is pretty impressive, but as said before no real difference when I use or don’t use the Google/Percona patches. Has anyone tried DBT2 with various io_capacity settings & different read/write threads? Have they seen a huge boost?
Next up is the Juice benchmark results! Stay Tuned for that!