If you read Yves blog post about waffle yesterday we are seeing some weird gremlins in the system and could use some scoobey doo detective work if you have some ideas. The strange thing is it only exhibits under high load. So it really seems like we may have missed some background cleanup process that accesses or removes pages from disk or the buffer pool without going through the functions we call waffle in (buf_LRU_search_and_free_block & buf_read_page_low ).
One of the idea’s I had was trying to narrow the scope of what’s being pushed and read form Memcached. Even though I am using file per table, system tablespace pages are still making it in and out of memcached. I thought if we missed something maybe it was here ( even though I could not find it in the code ). I mean cleaning up undo or internal data would seem like a logical place to miss something. So I hacked Waffle to only send blocks from space id’s > 1. This effectively means only actual table data should be going to and from Memcached.
To my utter amazement performance dropped by over 50% when I eliminated the reads/writes from memcached for space ID 1… that is just massively huge! In fact I counted nearly 2x more sets/gets on space 1 then I did on any other space. Maybe I am just tired right now, but this seems wrong. I mean sure space 1 pages may get LRU’d … but at that frequency it’s a bit crazy. I need to dig deeper into this.
***** I guess I was just tired:) Space ID for the system space is 0, dooooh “egg and my face are in alignment” ****