As we have mentioned several times you probably know we are planning to migrate our development over to the innodb-plugin and start following the path of the storage engine. That’s all well and good but what are the warts and bugs and things we are trying to fix? Ahhh yes the seedy underbelly of Waffle Grid. The reason why we call it experimental.
Big on my list is going to be ensuring data integrity. Currently you can manually ( or via some weird network corruption ) overwrite a good innodb page with an invalid one, the result? Well lets check it out:
I set the innodb buffer pool to 32MB, something where I know I can easily push data to memcached.
First turn on detailed logging & run a query that will flush data out to memcached:
|mysql> set @@global.innodb_print_memcached_stats=2;
Query OK,0 rows affected (0.00 sec)mysql> select count(*) from test_frag1;
| count(*) |
| 1039766 |
1 row in set (1.71 sec)
Now lets find a page that was flushed:
|tail -10 error.log
Memcached get: Block: 5f9d:0:13b7c : Thread : -1450775664 time:91
Memcached set: Block: 5f9d:0:cb7d : Thread : -1450775664 time:26
Memcached get: Block: 5f9d:0:13b7d : Thread : -1450775664 time:103
Memcached set: Block: 5f9d:0:cb7e : Thread : -1450775664 time:27
Memcached get: Block: 5f9d:0:13b7e : Thread : -1450775664 time:91
Memcached set: Block: 5f9d:0:cb7f : Thread : -1450775664 time:26
Memcached get: Block: 5f9d:0:13b7f : Thread : -1450775664 time:106
Memcached set: Block: 5f9d:0:cc40 : Thread : -1442186352 time:126
Memcached get: Block: 5f9d:0:6 : Thread : -1442186352 time:252
Memcached set: Block: 5f9d:0:cc41 : Thread : -1442186352 time:62
Memcached get: Block: 5f9d:0:14005 : Thread : -1442186352 time:177
Now lets overwrite a page in memcached:
|root@matt-laptop:/usr/local/mysql# telnet localhost 11211
Connected to localhost.
Escape character is ‘^]’.
set 5f9d:0:cc41 0 3000 2
now run our query again:
|mysql> select count(*) from test_frag1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
What does the error log show?
|Memcached get: Block: 5f9d:0:cc7f : Thread : -1450775664 time:86
090218 23:16:17 InnoDB: Assertion failure in thread 2844191632 in file btr/btr0pcur.c line 402
InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == buf_frame_get_page_no(page)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: about forcing recovery.
090218 23:16:17 – mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 182462 K
bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.thd: 0x8d12930
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
stack_bottom = 0xa986e0a0 thread_stack 0×30000
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort…
thd->query at 0x8d35ab8 = select count(*) from test_frag1
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Ouch, that’s not good! Yep we know… “experimental” right. Obviously this is our biggest challenge and it needs to be resolved before this a viable enterprise solution. A few ideas? We are looking into the idea of using checksum’s to overcome this issue. It would be easier to implement the check sum on the memcached side when it goes in, as the database would not bear the brunt of the check process… however that would mean we know good data was going into memcached, but we would still need to verify good data was received by innodb prior to putting it to use. We are still thinking about it elegant ways to do this and not sacrifice speed… ideas as always are welcome.
The second issue is performance related. Large sequential scans are slower in waffle then with normal innodb. The reason is sequential scans on disk are very fast efficient, more so then 10K reads from memcached. See here:
|mysql> set @@global.innodb_memcached_enable=0;
Query OK, 0 rows affected (0.00 sec)mysql> select count(*) from dbt2.customer;
| count(*) |
| 600000 |
1 row in set (5.42 sec)mysql> select count(*) from dbt2.stock;
| count(*) |
| 2000000 |
1 row in set (10.17 sec)
mysql> set @@global.innodb_memcached_enable=1;
mysql> select count(*) from dbt2.customer;
mysql> select count(*) from dbt2.stock;
A couple of ideas, we could give a session level variable for users to turn off retrieving from memcached when they run a query like this… but that would require an extra step. Instead I am thinking about adding multi-get memcached logic to the read-ahead functions. This would enable 1 call for up to 1MB worth of data out of memcached. This may substantially improve performance here. Any other ideas?
A third problem we are working on is a scalability issue. Currently we open up 1 connection per memcached server, per thread…. this can get rather painful. Lets say you have 5 memcached servers and 100 connections, that’s 500 connections to memcached! Not a good thing. Here we need to look at a connection pool. We could try a single global memcached connection, but I think the performance would suffer ( maybe worth a try though ).
While the noreply addition to libmemcached helped our set performance, their is still more I would like to do to speed this up. I would like to write a memcached multi-set command, this would enable us to mass send items off the LRU instead of doing them one at a time.
Outside of the read-ahead, I think looking for other opportunities to use multi-get could be very usful. In fact this may really help in high latency environments ( ec2/100mb connections ).