A few months ago I was at dinner with Yves Trudeau discussing what all consultants discuss in the late hours after a long day of hard work… how to improve performance and scalability. I brought up an idea to him to utilize memcached as an L2 cache for innodb. At first he was skeptical, but as we talked he was more and more intrigued by the idea. The idea was simple, add a set to memcached when something hit the LRU… then issue a get from memcached when you do not find the data locally stored in the buffer pool but before you read from disk. Starting from that point, you can work out any of the issues that would be sure to follow. So Yves continued emailing me asking me questions… then he sent me a note that he had made huge progress with the idea. Huge progress means that he wrote version 0.1 and had it working. That’s when the Idea really turned into a project.
We called it the Waffle Grid Project. Why? A waffle sort of looks like a grid diagram doesn’t it? And I like waffles, they happen to be very tasty. Having a working patch… We burned the midnight oil the last few weeks testing and fixing the code, building a proof of concept, and benchmarking it. So does it work? Yes, it does. Pretty well in fact. Take a look at some of the benchmarks below for a better idea.
Basically what this patch enables you to do is have a central node ( standard run of the mill database ) with several servers acting as remote L2 cache. An example: 1 Main Mysql server with 128GB of memory, 4 remote servers with 64GB of memory each… giving you a L2 cache of about 256GB. With a fast private network the L2 cache should return data faster then can be retrieved off of disk.
Here is what this would look like:
Why did we do this? Several reasons, but here are a few:
-The obvious is we see clients everyday who continue to struggle with disk performance, the solution should be to add better disks ( ssd maybe ) or add additional memory… but what happens if you are cash strapped and your server is maxed out in terms of memory?
– Another reason is we see people struggle with how to scale or boost performance on 32bit systems (sometimes a company mandate prevents moving to 64bit)
-To help those who reached a size where they need to shard, but you can not immediately rip the the application apart and rebuild it to take advantage of sharding.
– To help people running DRBD as an active/passive ha solution get some additional benefit out of their passive server
– Realistically though we are a real adventurous sort and want to be able to say to all our friends “I run a 10 node Waffle Grid with 5TB of memory. ”
So as mentioned before, it does work. But your question is probably “How well does it work?”.
Lets look at some benchmarks:
|1.5GB Local BP||1.5GB Local BP, 768MB Remote||2.268GB Local BP|
Obviously if you can add local memory your going to be better off, but if you can not add memory locally… a remote buffer pool may help you. Here the remote buffer was 38% better then having no remote buffer, but still 16% slower then just adding the memory locally. 38% is better then nothing last time I checked. Plus we have done limited optimizations, so this could get better.
So will Waffle Grid solve world hunger? No! Will it help every application? No! Because we are reading data over the network, cache misses are adding time to each page retrieved from disk. So if the data is not in cache, then you have the overhead of a checking local memory, checking remote memory, and then checking disk. This extra step can add up. Additionally some other magic can happen ( i.e. read-ahead, data accessed from disk sequentially should be fast, etc. ) that we we do not do yet. The extra latency + all these other kinds of factors means full table scans, or massive data retrieval may just suck right now. In fact tests have shown 5-6X increase in execution time on querying some large aggregate datasets. This is an area we are working on, specifically we are looking at implementing some intelligence that will help reduce the number of cache misses in exchange for using some extra memory locally. Let me say this: this is experimental code right now… it will not work flawlessly and we still have a ways to go. We are announcing this now because it does show some promise.
You may also ask, What about SSD? Doesn’t SSD make Waffle Grid obsolete? NO! in fact I ran a great deal of tests using both SSD and a remote buffer pool. While the numbers were not as dramatic as Waffle Grid on machines with standard disks, I still saw 20%+ improvement.
Want to learn more? Try our wiki.