WaffleGrid: Cream Benchmarks, stable and delivering a 3x boost

Lets get down to how the latest version of Waffle Grid performs.

Starting off simple lets look at the difference between the wafflegrid modes. As mentioned before the LRU mode is the “classic” Waffle Grid setup. A page is put into memcached when the page is removed from the buffer pool via the LRU process. When a page is retrieved from memcached it is expired so its no longer valid. In the New “Non-LRU” mode when a page is read from disk, the page is placed in memcached. When a dirty page is flushed to disk, this page is overwritten in memcached. So how do the different modes perform?

4GB Memcached, Read Ahead Enabled TPM % Increase
No Waffle 3245.79 Baseline
Waffle LRU 10731.34 330.62%
Waffle NoLRU 10847.52 334.20%

You can see here that with 100% of the data fitting in memcached we get about a 3x boost in performance over a non-waffle enabled setup. Note these tests here have the read-ahead enabled as well as the doublewrite buffer. With the ability to have all the data pages in memcached, the Non LRU solution shows up just a small touch faster the the classic LRU.

The classic LRU mode really shines when you have less memcached memory then data in mysql.

1GB Memcached, Read Ahead Enabled TPM % Increase
No Waffle 3245.79 Baseline
Waffle LRU 6306.1 194.29%
Waffle NoLRU 10745.62 331.06%

That’s a nice jump. The reason for the jump is with the non-lru code you can have the same pages in both memcached and in the BP. So when you have lets say a 2.5GB data set, a 768M BP, and 1G of memcached… its possible for you to really have the same 768M worth of pages in both MySQL and Memcached.

So as noted by myself and others, having the read-ahead enabled is typically a waste. I have a feeling this is do to excessive locking when a page is added to the aio queue and the page is being read from disk by another thread, but I still want to figure that out for sure. Regardless I next tested performance with the read-ahead disabled.

4GB Memcached, Read Ahead Disabled TPM % Increase
No Waffle 4983.9 Baseline
Waffle LRU 12836.64 257.56%
Waffle NoLRU 13351.68 267.90%

Another nice bump in overall performance, however the impact of using waffle is mitigated somewhat reducing the performance increase from 3.3x to 2.6x.

Your probably wondering, why the two waffle modes? I mean the non-LRU is not significantly faster then the LRU mode, and the LRU mode seems more flexible. Well an easy answer is you can use your already deployed memcached. The slightly fuzzier answer is with higher concurrency it appears this could be significantly faster. If you think about the architecture the LRU mode is going to cause any operation that needs to LRU a page to wait while that page is set in memcached, while the non-LRU mode will only use the background flush process to write changes. This should in theory lead to faster performance in certain workloads.

Why wasn’t this reflected in my tests? Well, my test hardware is old and rather out of date. So I am forced to test performance here on a 4 core, 8 GB machine. I can not test this at 16 cores, with several different memcached servers. In fact my memcached server I test with only has 1.5Gb of memory in it, which really limits my testing. When you see 4GB test numbers I am testing memcached on the same server as MySQL… not really the optimal setup.

But their are some things I can do to further boost performance on my local box. One of them is disabling the doublewrite buffer. While this is not entirely safe, it can lead ( according to my testing to 25-30% boost in performance ). So Let’s look at some numbers with the doublewrite turned off.

The first is hitting a 1.3Gb memcached over 1gbe ( note these benchmarks are over a 60 minute period, the ones above are 15 minutes):

Test TPM % Increase
No Waffle, BP 768M 4644.64 Baseline
Butter LRU, BP 768M 11146.25 239.98%
Cream Non-LRU, BP 768M 12328.19 265.43%
No Waffle, BP 2.5G 24097.5 518.82%

The numbers are very similar to the 1GB memcached on the localhost I show above. The Non-LRU beats the LRU version by a very small margin here, but it should because we do not have memcached sized large enough to fit all the data from MySQL.

Now lets run the same test with a 4GB memcached ( on the localhost ).

Test TPM % Increase
No Waffle 4644.64 Baseline
Butter LRU, BP 768M 13379.87 288.07%
Cream Non-LRU, BP 768M 17056.16 367.22%
No Waffle, BP 2.5G 24097.5 518.82%

Now we see more separation between the two waffle modes with the LRU @ about 2.9x faster and the the Non-LRU @ 3.7x faster. The performance difference should be larger, but I hit some flush issues where performance stumbles about 20 minutes into the test ( need to try adaptive flushing here ).

I need to continue my testing to get a better feel for other scenarios where each mode makes sense, and I need to find some faster hardware to also rerun my tests on. But these numbers are looking very good indeed. I know lots of places that would love to get 2-3x more performance out of their MySQL deployment.

I would love to see others give this a try and let us know their individual experiences.

This entry was posted in benchmark, Matt, mysql, performance, Waffle Grid. Bookmark the permalink.