Waffle Grid: We need a MRU not an LRU!

2008-11-27 at 08:18 am Matthew Yonkovit

I knew something was wrong with my previous numbers, and there is an optimization that should improve it.  So I was trying to test out the performance difference between a Waffle running localhost memcached and one with a remote memcached when I noticed something.  I noticed that the disk IO, specifically the reads per second where higher then when the database was using Waffle then when I was not using a Waffle.  Why this was odd was that both of these tests should have 100% of the data in the buffer pool or in remote memcached, so their should be limited read activity.  Then it hit me, the memcached LRU is going to mess us up, let me explain:

The idea behind Waffle grid is when you hit the internal innodb LRU, put the data that was to be discarded into memcached, so the next time its needed you can retrieve it from memcached instead of disk.

I know there is more too it then my example, but this is a simple example that illustrates what is happening:

Basically 129 & 130 are being read in from disk, the LRU needs to clear room, so 123&124 go off to memcached, memcached has to clear space so it’s LRU kills 110 & 111 to make room.  Make sense?

So now we want to read something out of memcached into the local buffer pool, but before this can be done data needs to be freed in the local buffer pool. Which will get written to memcached causing an LRU all its own.  So above 112 & 113 are removed from memcached, 125/126 move to memcached, then 123/124 are retrieved.

Make sense?  Well what this means is the reads from 123/124 are the newest operations, putting them at the tail end of the memcached LRU.

Take a look:

Realistically we want 123/124 to be removed from memcached before anything else.  The data is completely useless after a get.   If it hits the LRU in the Bufferpool it will simply push it to memcached overwriting the garbage data, so this needs to become the next data to be purged, not the last.  Because the gets are getting placed lower in the purge order my L2 remote buffer pool is saturated with this useless data instead of preserving the data getting sent to memcached by the Innodb LRU.  This is seriously cutting the effectivness of the remote buffer pool.  An example in the extreeme would be a 1.5GB local buffer pool may have all 1.5GB duplicated in the remote buffer.  To avoid this you really need to double the remote buffer.

So some idea’s…

An easy potential fix?  Hack memcached to allow using a specialized MRU (most recently used)  in place of an LRU.  Basically have a get flip a flag when it is returned or change the timestamp.  Then use this data to have the MRU expunge the most recently retrieved data first.  I guess you could also leave the LRU in place and simply add code in memcached to set the timestamp on every get to something really old I.e.  1900, or auto set the expire date on each get.

looking deeper into this it looks like we could modify memcached to make the do_item_update function ( which appears to be only called when doing a Get ) to put this data at head of the LRU to be expired first.  Possibly setting the it->time to current_time-100000000000  or something crazy like that.   Maybe we could add a startup parm making this like a get LRU offset variable or something.
Here is the function:
items.c
#280 void do_item_update(item *it) {
#281     MEMCACHED_ITEM_UPDATE(ITEM_key(it), it->nbytes);
#282     if (it->time < current_time - ITEM_UPDATE_INTERVAL) {
#283         assert((it->it_flags & ITEM_SLABBED) == 0);
#284
#285         if ((it->it_flags & ITEM_LINKED) != 0) {
#286             item_unlink_q(it);
#287             it->time = current_time;
#288             item_link_q(it);
#289         }
#290     }
#291 }

Thought about after issuing a get, issuing a set to set the expire timestamp as well, but that has the disadvantage of making 2 calls into memcached for every retrieval, which would really slow things down.

Any other interesting fresh idea’s out their?

Why is this a big deal?  It does work without it, but the performance improvement from this small of a change can be huge.  In the extreme its entirely possible that the entire current contents of the buffer pool be in memcached as well, duplicating all the data for no reason.

4 Responses to “Waffle Grid: We need a MRU not an LRU!”

  1. Couldnt you get mostly the same by just sending deletes to the memcached server when you pull the blocks off the disk? It’s ok to send deletes for things that are not in the cache.

  2. matt

    I want to avoid a second call into memcached to delete objects. We have talked about adding a delete into the get code in memcached as well, this could work well… however I would like to make sure that it would be possible to turn this on/off. A lot of enterprise clients want a single version of software they can deploy everywhere. Might not be a big deal now, but in a few years who knows.

  3. [...] Waffle Grid: We need a MRU not an LRU! [...]

  4. Using the waffle patch how much of an improvement in performance could reasonably be expected?

Leave a Reply