I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled “Choosing the Right Tools for the Job, SQL or NOSQL”. While this talk is NOT supposed to be a deep dive into the good, bad, and ugly of these solutions, rather a way to discuss potential use cases for various solutions and where they may make a lot of sense, being me I still felt a need to at least do some minor benchmarking of these solutions. The series of posts I wrote last year over on mysqlperformanceblog.com comparing Tokyo Tyrant to both MySQL and Memcached was fairly popular. In fact the initial set of benchmark scripts I used for that series actually has been put to good use since then testing out things like a pair gear6 appliances, memcachedb, new memcached versions, and various memcached API’s.
When I started really digging into some of the other popular nosql solutions to expand my benchmarks it became apparent that most of these tools have fairly well defined API’s for Ruby, however in general the API’s for perl in some cases may not exist at all or are rather immature at this point. So I decided to rewrite my initial benchmark suite in Perl. With the help of my co-presenter for this talk ( Yves ) we are writing a tool that will hopefully be able to test the same basic tests against a wide variety of solutions. Currently I have tests written for Tyrant, Memcached, Cassandra, and MySQL. We will be expanding these tests to include Redis and MongoDB for sure (Maybe NDB) … beyond that I am not 100% sure. The challenge is going to be writing code that not only tests basic features, but also can test the advanced features of these solutions. After all a simple PK lookup can be done on all of these solutions, but that’s not necessarily the bread and butter of a solution like MongoDB or even Cassandra. Its the extra features that make these more compelling. We will be releasing the code when its ready.
I have not started my more exhaustive benchmarks yet… as I am still writing parts of the benchmark, but I have been running a few benchmarks. I generally hate publishing or mentioning results until I have taken the time to analyse them and ensure I did not miss anything, but what the hell. In a very short read only test, using PK based lookups to compare Innodb -vs- cassandra -vs- memcached ( a really small data set that should easily fit into memory on both on my laptop **single node **) I end up averaging ~1.2K reads per second from Cassandra, ~ 4K reads per second from Innodb, and ~ 17K reads per second in memcached. Now as I setup more benchmarks I will test multi-node performance, tune the configs for the workload, etc… but it is interesting to see the early performance difference.