What’s up with WaffleGrid?

You probably haven’t noticed but I have not blogged since the UC. It is not because I am upset by the perspective of working for Oracle, I have simply been busy tracking down an issue we have with WaffleGrid. We discovered that under high load, with DBT2 on a tmpfs, we end up with error in a secondary index. In the error of MySQL, we have entries like this one:

InnoDB: error in sec index entry update in
InnoDB: index `myidx1` of table `dbt2`.`new_order`
InnoDB: tuple DATA TUPLE: 3 fields;
 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000bea; asc     ;;
 2: len 4; hex 80000005; asc     ;;

InnoDB: record PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000bea; asc     ;;
 2: len 4; hex 80000004; asc     ;;

TRANSACTION 14469, ACTIVE 1 sec, process no 7982, OS thread id 2995481488 updating or deleting
mysql tables in use 1, locked 1
26 lock struct(s), heap size 2496, 65 row lock(s), undo log entries 60
MySQL thread id 31, query id 1246503 localhost root updating
DELETE FROM new_order
WHERE no_o_id = 3050
  AND no_w_id = 1
  AND no_d_id = 5

That are triggered when transactions are purged. Basically, an entry in the secondary index has to be deleted an when MySQL access the page, the row is missing.

Matt and I have dig this issue to the limit of our sanity and although we gained knowledge of the InnoDB code, we are still stuck. Basically what we are looking for is a way for a file page to go to disk without hitting the LRU list or a possibility to have 2 pages in the buffer pool with the same space_id:offset pair. Anyone who has inputs on these topics, please, comments this post…

About Yves Trudeau

I work as a senior consultant in the MySQL professional services team at Sun. My main areas of expertise are DRBD/Heartbeat and NDB Cluster. I am also involved in the WaffleGrid project.
This entry was posted in mysql, Waffle Grid, yves. Bookmark the permalink.

6 Responses to What’s up with WaffleGrid?

  1. Xaprb says:

    I keep wondering if you’re running into a bug in InnoDB that hasn’t been exposed elsewhere because of something like different timing or whatnot. You’re using (abusing?) it in a new way, so it kind of makes sense that you’ll find new bugs.

    • matt says:

      It is very possible. This should very very straight forward. Page is removed and sent to remote cache. Page is read from remote cache or disk. It seems like something is bypassing what should be the normal LRU process or the normal page get function.

  2. Pingback: Waffle: limiting the space ids being pushed to memcached » Big DBA Head!

  3. Yves Trudeau says:

    It is true that we are pushing InnoDB by removing a significant number of disk IO. The problem appears to be only visible when there is a large portion of Dirty pages in the buffer pool but all the means InnoDB uses to free pages pass by the LRU mechanism. Furthermore, it is a dynamic problem, the actual bug might be many seconds before the error is thrown, hard to trace.

  4. Heikki Tuuri says:

    Yves, what InnoDB version you are running?

    Regards,

    Heikki

  5. matt says:

    Heikki,

    We have tested and verifed this issue with 5.1.27-5.1.33 + 5.4 + the plugin 1.0.2-1.0.3. So its not a new issue. It only occurs under heavy load.