Testing MYSQL on the Violin Memory Flash 1010 Part III:

So we have already looked at sysbench & dbt2 tests… now we have to look at the new Juice DB benchmark. Juice runs a series of queries generate its load, these queries are combined into a workload. I tested the v1010 with a mixed workload ( mix of short & long updates and selects ), a mixed simple workload ( mix of short running updates and selects ) , and a read only ( selects which are designed to hit the disk ) . Because this is still an evolving benchmark I am including results from an Intel MLC drive (note these boxes are vastly different).  Keep in mind this is not a completely fair comparison. The Intel drive is not the enterprise class drive, but even with the SLC drive I don’t think its a fair comparison. The price difference between these two solutions is ~$50/GB -vs- ~$12.5GB.

The setup for this test created about a 20GB database, with each of the 3 large tables coming in around 6 GB each. I tested primarily with a 768M buffer pool in order to keep this as IO bound as possible. Let jump into the numbers shall we:

Mixed Workload

Let’s start with the mixed workload test. This test runs a series of updates & selects against a large dataset. I ran this several times, this particular run was run for 900 seconds with 8 threads. Here is the time differences between the two drives… you can see some of the tests on the Intel drive took as long as 10x longer.

I can provide the queries if anyone is interested.

Lets take a look at the CPU during the test:

What’s interesting here the amount of idle CPU. I upped the number of threads but we appear to have maxed out the IO throughput. Take a look at the Transaction per second:

Here we are hitting 40K+ TPS here, which is what we were getting earlier in sysbench so its possible this is the max throughput we can expect from the flash based v1010. Before we move on lets take a quick look at the data transferred from disk during the mixed workload test:

We do peak out around 200MB/ps which is actually pretty good, but well short of the specs on the v1010.

Simplified Mixed Workload

One of the things that was bothering me about the Original mixed load test was the frequency and length of the longer running queries. These queries would get backed up and I just did not get enough of the smaller more realistic statements being executed. So I changed the test to eliminate some of the longer statements and focus on the quicker more nimble queries. I left the original test in Juice because I had already used it extensively. So lets take a look at my “simple mixed” test.

Here are the statements:

Yes Q4 in this test is the same as Q4 in the previous test. Q10 however has changed, so I need to update it to change the query number. Once again a 10x+ performance difference is common here.

From the CPU side I am still a bit baffled, the IO wait is not that high, but certainly higher then the DBT2 tests. These are IO tests however… the Intel tests yeilded about 25% wait IO with an average CPU of 5% so maybe I should feel lucky I was getting 30% cpu with the v1010.

Even with the “simple” test I am getting close to 40K TPS:

So I still look to be coming close to what I think is the max.

The amount of data transferred is very close to what the old mixed test is as well:

For comparison’s sake here is the same test with my Intel drive:

Big difference right?

Read-Only Workload

Now on the read only side, things are a bit different. Here I am purposely building temp tables, I am purposely doing full scans, and I am purposely trying to hammer the IO. The problem with these is they are long running, and even doing 15 or 30 minute runs only yields a handful of queries:

Surprising the Intel was as fast or faster on a couple of the “read-only” queries. Now of course there were not enough queries run to call this conclusive.

I wanted higher CPU, and I got it:

But, I am doing a lot less TPS…

This maybe due to IO merging (TPS as reported from IO stat is a request made up of some number of actual IO’s ) , read-ahead, or something else? Personally I am not 100% sure. Iostat is not reporting any rrqm’s, but this maybe inconclusive. The lack of merged requests maybe simply caused by the v1010’s driver not reporting them back to the OS.  Not reporting then back would make sense because I am still transferring a lot of data:

Google/Percona IO Patches Workload

Lets go back briefly to the discussion on the Google/Percona IO patches. I wanted to test these out again to see if I could push more IO through the system. The short answer was no.

While I did see a drop in avg run times, the associated stats really did not show much of a difference:

I guess the lack of performance boost makes 100% sense if I was maxing out the v1010, no extra knobs or parameters would be able to drive more traffic to an already maxed out device.  Still I need to do more testing using these patches to verify.

Quick Compare:

Because Juice is still evolving and because of vast differences in hardware ( 3.4ghz  -vs- 2.4Ghz, 16Gb -vs- 32GB of memory ) I was not able to draw many comparisons between the TMS Ramsan 500 and the V1010, but I felt data transfer rates in one of my mixed mode tests were similar enough to compare.  Even if the CPU is slower, i figure the amount of data needed to complete one of my tests should be similar.  While this metric is slightly flawed, it can give you some idea’s on overall performance.

Both maxed out around the same transfer rate which shows both are capable of delivering lots of data to keep your database humming along. The question remains would the TMS box show better performance if the server was better (better cpu, more memory)?

Final Thoughts

The flashed based V1010 is a powerful device.  I was impressed by performance and reliability of the unit.  It is very fast in its current beta state and promises to get faster as the product matures putting it in exclusive company at the high end of flash devices.  It is going to be facing a lot of competition in the coming years from not only high-end storage vendors but also the lower end commodity market.  While the performance of both the RAMSAN 500 and the V1010 easily trounces a single Intel SSD, what will the performance boost be compared to a FLASH+RAID solution? Personally I have not had access to several drives to test with or have not seen other people publishing any benchmarks on RAID performance from 4+ Intel drives in regards to MySQL.

While I can speculate the performance of multiple flash drives will suffer due to the SATA interface… we don’t know if we are truly capable of driving the database to require more then what SATA can deliver. Look at the throughput numbers above. I think we maxed out the raw Iops we could push, but we did not come close to actually transferring more then even a SATA II interface could handle ( 384MB/s). In this case is the PCIe, Fiber, or Infiniband connection making a difference? My thought is yes, as I have harped on in past with network performance throughput is different then latency, 1 Gbe card delivers packets faster then a 100Mb card. The same should hold true for these high speed connections… but we just do not know yet. These questions are sure to be answered as Flash grows in popularity.

That being said. Their is no doubt if your looking for high end performance with large amounts of data the V1010 is a solution you will want to look at. Its going to delivering a lot of performance, and the price is not that outrageous considering the capacity and the speed.

This entry was posted in benchmark, hardware, Matt, mysql, performance. Bookmark the permalink.