Well its been a while since I started this, so I thought it was about time to put some of this to rest. I have been beating up the Mtron disks for several weeks. During that time I had a couple of really odd outages, a lot of head scratching, and my fair share of number crunching. I have been talking with Kevin Burton a lot the last few weeks, we have compared and shared information on the optimal configuration for the current generation of ssd drives. I have held back a large number of my benchmarks because i want to get a complete picture. I want to evaluate the drives & the MFT software while minimizing any self induced mishaps. I have ran and reran tests over and over again, at last count I had run over 280 DBT2 tests using my little server. I have tried wacky and wild things, some worked some did not. In the end I think I found out enough to make some recommendations and share my thoughts.
I focused on the bare Mtron SSD drive in my previous benchmarks, this time I will not only expound a little bit more on the bare drives but I am also adding in a drive that is using Easy Computing Company’s MFT technology. MFT is a propitiatory device mapper technology that helps rewrite random IO’s to make them more efficient. You use a standard off the shelf SSD disk drive, and MFT enhances it.
Lets do a quick recap for those joining us. As noted before, bare SSD drives are very sensitive to Random Reads.
With 10K reads, a single 10K Raptor drive alone could only handle 161 IOPS, while the Mtron drive was 32X faster in the same test. The opposite happens when testing writes. The SSD drive is 2x slower then the Raptor drive on random writes. As the workload moves from write heavy to read heavy you start to see a shift in the number of IOPS that can be performed… see below:
As you see, as the workload is more and more read heavy you start to see more and more of the speed benefit ( in terms of IOPS ). It is a trade off slower writes for much faster random reads. Ok, Everyone still with me? Because I am going to blow your doors off. If only that the random write performance of SSD’s was up to snuff with its read performance… the possibilities. MFT tries to help make this dream a reality.
MFT Sysbench tests:
Lets look at the same numbers from sysbech with the MFT enabled drive thrown in:
In the generic sysbench tests the MFT enabled Mtron drive was 30x faster when performing the random write tests. What this adds up to in a mixed load test is just scary:
In the synthetic benchmarks the MFT/Mtron combo just blows away the competition. The Mtron drive alone is just put to shame. The best test for the standalone Mtron was a test that had 83% reads and 17% writes it was able to sustain an average of 518 IOPS. In the same test with MFT running on the same drive produced 4,495 IOPS! That ends up being almost a 9X increase.
Lets look at the real world tests. Over the past several weeks I have gotten widely varied results from DBT2 on very similar tests. One of the puzzling items I discovered was that after performing a test and deleting the datafiles, the following test seemed to take longer to setup using MFT. For instance my initial script for testing bare Mtron drives had a 2 minute wait between tests and another 2 minute wait between the startup of MySQL and the starting of the load for dbt2. This wait was sufficient to allow all the datafiles to build after I removed them from the previous test ( NOTE: I used safe_mysqld, I should have used the init.d script ). When I started the MFT tests, the first test was successful, but the next failed on the load. The build process for the innodb files increased dramatically, and as a result the load failed due to MySQL not being available yet. I increased the time between runs and this was fixed. I believe this to be due to how the MFT software handles writes.
According to the folks over at EasyCo MFT relies on free space in order help speed up the random write process. Basically from what I gather it writes to the free blocks then comes back later on to clear up the old blocks. During my testing I was able to verify this. I saw an immediate improvement in relative performance when i reduced the space on the drives. In order to still flex the drive i started reducing the memory footprint on MySQL to closely match the reduction in size. If you have not done this before it works great. Basically lets say you have a 20GB data and 3GB allocated to the Innodb Buffer pool, by reducing the size from 20GB to 10GB and the memory from 3 to 1.5GB you should get similar performance #’s. Take a look here:
This is a graph of DBT2 Transactions per minute. I was running all of my tests against a 22GB database ( ~80% full on MFT, ~73% on bare Mtron ) with 2.75GB allocated to the innodb buffer pool. Because I was confined to a 32GB drive I reduced the space in half, and reduced the memory footprint by a little more then half. If I had to guess how many TPM I should get I would have said about 5600, instead the reduced size netted me 5900 … about a 6% difference in what I expected. A small enough difference I decided to concentrate on using 100 warehouses (11GB) instead of the 200 warehouses (22GB).
Since I already mentioned how free space is vital in MFT performance lets look at tests comparing a 22GB db vs an 11GB db. The parameters for this test are identical with a innodb buffer size of 2.75GB and running 100 active warehouses.
As you see above, running 100 active warehouses in the smaller database ( 11GB vs 22 GB ) nets almost a 2X increase in performance with MFT, while on the other hand the Mtron numbers stay the same. I expected the Mtron numbers to remain constant. We are testing the same amount of active data in both tests, we are just changing the total amount of data to match the actively used size. Based on this, we have our first recommendation: if your thinking about deploying MFT with SSD plan to only fill your your drive to about 50%-60% full. If you get higher then that you risk that the performance benefit will start to shrink.
Shrinking the Buffer Pool:
Partially due to the physical storage space constraint and partially just out of curiosity, I decided to continue to shrink the buffer pools to show performance based on % of data in memory as well as try and get a better idea of actual drive performance. What I hope to determine, is how much impact is the memory is having on the physical SSD drive. Additionally we should be able to determine what sort of memory requirements you would need to duplicate the TPM #’s I was able to achieve when you introduce larger sets of data. For example using MFT I was able to achieve 9600 TPM with an 11GB DB and a 2.75 GB buffer pool. This test had a buffer pool larger enough that 25% (2.75/11) of the data that was being used could be in memory. Using that, I can guess that if I had a 100GB database I would need to scale the buffer pool up to 25GB (25% of 100GB is 25GB) to achieve the same 9600 TPM .
Lets take a look at my tests with a 2.75GB, 1.5GB, 1GB, and 500MB buffer pool:
What is interesting in these numbers is that the MFT device is able maintain the high throughput even as the memory shrinks. Before I explain in more detail lets look at these in a slightly different way. Here is the X times increase over the other disks:
Looking at when 25% (2.75GB Buffer pool) of the total data size could be in memory the Mtron drive blows the Raptor drive out of the water netting over 8X increase in transactions per minute (5202 vs 625). If you think thats good, the MFT enabled Mtron doubles that, hitting 15X over the meager Raptor, (9628 vs 625). Looking at the graphs you can definitely see a trend. As the % of memory allocated to the buffer pool compared to overall data shrinks the bare Mtrons poor write performance gets more and more pronounced. More then likely this is because there is less of a chance of having the data needed in memory as the ratio of buffer to data size decreases. Going to a 500 MB buffer pool the Mtron is just shy of 2.5X faster then the Sata Raptor. This represents a significant drop from the 8X increase the Mtron first showed. The MFT drive shows an entirely different pattern, in fact as the memory shrinks the separation between the Mtron’s performance boost and the MFT’s gets larger and larger, peaking at just over 6 times faster then the Mtron when the buffer pool is set to 500MB ( about 4.5% of the dataset ).
For those of you running MyISAM I have some more numbers to share. I took the same dbt2 setup and ran the 100 warehouse test with a 1GB Key buffer size. Note that dbt2 locks like mad when running this test, but the results are still staggering:
The bare Mtron drive showed a meager boost over the Raptor ( about 1.5X ), but the MFT enabled drive smoked both the other drives, netting a 7.6X performance boost. I guess one way to reduce locks is just to have them complete fast.
Lets look at a few other options as they relate to SSD & MFT.
While emailing with Burton we discussed using the noatime option when mounting the drives… from a generic benchmark standpoint this looked like an awesome win. I decided to put this to the DBT2 test. So what is the TPM difference with using atime vs noatime?
Looking at the numbers the difference in new order transactions for mft is a mixed bag. Noatime on the 50 warehouse test which uses less data, saw about a 7% improvement, but the 100 warehouse test saw a decrease of about 3% ( these numbers bare out over multiple tests ). On the bare Mtron drive we see a small performance improvement in both tests (11% & 4%) . The interesting trend here seems to be the 7%+ drop i experienced when the active dataset was larger. Is it possible that the noatime option could help boost performance for smaller datasets but as the data gets larger there is less of an impact… hmmmmm that is something to try later on. For most environments it looks like noatime will give a little boost.
Next I looked at the deadline and noop schedulers specifically when using MFT & SSD.
There was not a huge boost either way when i ran my tests with the NOOP or the Deadline scheduler.
Let me talk about a few problems I had during my testing, everything was not all high speed performance. I had 5 rather odd issues that occurred during my testing:
The initial MFT driver that was sent to me was a new driver that had a rather nasty bug. When I tried the run a shell script and redirect output to the mft enabled drive ( i.e. /mft/run.sh > /mft/run.out ) after a certain amount of io the run.out file would overwrite itself. i.e. the first several lines in the file would be garbage, then what should have been the 1000th line would be the first legible thing. The folks at EasyCo sent me an older release which fixed the issue. They were right on the ball with this and very helpful in tracking down the issue.
A kernel panic happened in my system. The kernel panic started when writing a lot of data to the MFT drive. At a certain point ( after about 2GB in ) everything would stop. Then the symtpoms spread to other drives. I tried the bare mtron in ubuntu and capture the following kernel info, which is below. In fairness after dismatling my machine, reseating the memory/controller cards … the panic has not come back.
A quick google search shows similar panics with various kernels. But this “crash” occured in centos & ubuntu and with several different kernels. It started 1 day with just the MFT drive, two days later the other Mtron without MFT, then about 3 days after that my internal Raptor drives. during that time frame I tried repeatedly to fix this by switching controllers, cables, etc. A simple dd would break on the MFT, while the mtron would work. Then dd broke on the mtron … then dd broke on the Raptor. The timing was all really really strange. I can chalk this up to some strange hardware problem, and not MFT or SSD. Especially since MFT was not active in Ubuntu.
MFT drive went wonky in the middle of a one ( out of 280 ) DBT2 test. Half way through the drive went read only. Below are the errors that were raised. After fsck + a remount all was fine.
From the message log:
After a reboot the MFT Drive could not mount, fsck failed on ext3. Several blocks had to be fixed.
In the middle of another test run, dbt2 started throwing errors. The MySQL Error log showed disk corruption. The test was a 100 warehouse dbt2 test, with separate innodb datafiles for every table. I had run this test 4 or 5 times prior without error:
I talked to the folks over at EasyCo ( makers of MFT ) on the last 3 issues. There are some issues with the version of MFT I was using ( remember I had to use an older version due to problem #1). These may be fixed in the new release that they are working on now. They said a maintenance release should be ready in a few weeks. Please note the above issues are not necessarily MFT related. The last 3 did occur on the MFT device however.
Final Thoughts and Conclusions:
While this generation of SSD disk shows a lot of promise there are some limitations to the hardware. If deployed in the proper environment it has the potential to significantly improve the performance of applications. Make sure you look at your applications characteristics, how much memory your have, etc before taking the plunge. But with the relatively low cost of the technology, you could net 10X+ performance increase on your database servers for under $2000.
The software the MFT is working on holds a great deal of promise. Its list price adds a small premium to the overall price of the regular SSD drives, but the pay off in terms of performance will make it worth investigating and potentially owning. As with all newer technologies there are some issues, so testing the hardware/software with your workload & data is key. EasyCo has been very easy to deal with and seem eager to address the issues that I found.
In the future SSD and other vendors will streamline SSD performance, and its about time. Disk technologies have been slow to produce the ground breaking changes that have come in CPU and Video Card technologies.