Waffle: The Mystery Continues

So I spent the weekend looking at places where we may have missed something in the code for waffle. You can actually see some of the stuff I tried in the bug on launchpad about this, but the weird thing is the very last thing I tried. As I took a step back and looked at the problem ( secondary index corruption ) and our assumption that we “missed” something, I decided to find the place where pages are written to disk and to push to memcached from here as well as from the LRU. With the double write buffer enabled that place should be buf_flush_buffered_writes. By pushing to memcached here we should eliminate the page that falls through the cracks of the LRU. Basically this should help ensure memcached has an exact copy of the data that exists on disk. The result? It failed with the same secondary index failure. This means:

a.) maybe we have a problem in the memcached/libmemcached layer ( seems unlikely that this would cause an error at the exact same time every run )

b.) Somehow multiple copies of the same page end up in the BP ( maybe left over from a merge process ), where the “invalid” or temp page is LRU’d but never makes it to disk… I think I will eliminate the push to memcached via the lru process and see if that fixes it ( should validate this theory )

c.) I missed something else

d.) A page is set into memcached via some mechanism
A page bypasses the normal memcached read process, to load a page into the BP
A page then is changed in the BP
Then that page is either re-read from memcached overwriting the change or the change is written to disk without going through the “normal” fil_io or lru process…

Not sure if I can see many other scenarios here….thoughts?

This entry was posted in innodb internals, Matt, mysql, Waffle Grid. Bookmark the permalink.

2 Responses to Waffle: The Mystery Continues

  1. Heikki Tuuri says:

    Yves, I see that you are implementing your own buffering directly under the InnoDB buffer pool.

    The fact that a secondary index record is missing suggests that you have an old version of the index page. That may be due to the interface InnoDB memcached somehow missing page writes (to the data file).

    Regards,

    Heikki

  2. matt says:

    Thanks for responding.

    Yes we are adding a secondary buffer cache directly under the primary.

    After several tests we came to the exact same conclusion that you did, we must be reading an old page. But the problem is why. Let me walk you through the code.

    We insert into the secondary cache when a page is removed from the primary BP via a call to buf_LRU_search_and_free_block. We read from the secondary cache via buf_read_page_low. When we read a page it is immediately removed from the secondary cache… although even if it was not we should overwrite it via the LRU call.

    Now we have looked through the code several times, we thought maybe fsp_free_page may cause issues so we issued a precautionary “remove” from the secodary cache when this is called. But we still got the error.

    Then I decided if I am missing a write somewhere to the secondary cache I should set the secondary cache when the datafile is updated. So I added the push to the secndary cache whenever a block goes through buf_flush_buffered_writes ( doublewrite is on ). This still produces the error.

    The question is what are we missing? Is their something that could read from disk without going through the buf_read_page_low? And is their somehting that could write to disk without going through buf_flush_buffered_writes? Is it possible a background read could re-read a page that is in the bp from disk directly?

    Since this happens consistently under load, but only after X minutes or Y transactions I think its some infrequently run peice of code. I.e. something in the flush.