Great to see everyone at the UC

2010-04-15 Matthew Yonkovit

It was awesome to see everyone at the 2010 mysql UC. Sorry if I did not get a chance to chat with everyone, time just flew by! I had great turn out for my two sessions and had a lot of great conversations with people. If people are looking for my slides they are posted on the User Conference Website here: http://en.oreilly.com/mysql2010/public/schedule/speaker/75377 .. Thanks Everyone!

Quick Cassandra Notes Part 1

2010-03-31 Matthew Yonkovit

Trying to use the Ruby bindings to do benchmarking, so far things are going rather slow compared to other benchmarks in Python. This could be the size of the data I am testing with as well. Still looking into things however so far loading 1 Million rows into cassandra takes ~4.6GB, while loading the exact same data into mysql takes ~950M. A 4.5x increase in storage is a lot, not sure if that will maintain as I get more data into the system, or if there is just a lot more overhead at the start. Will load 2M and 3M rows to see.

Also you have to “warm” cassandra like other databases… after loading my 1M rows, I ran some quick tests. 125 ops/s, 288 ops/s, 311 ops/s, 1530 ops/s, 1872 ops/s, 1868 ops/s…

It looks like I am really bottlenecked by the thrift calls in ruby ( per profile )… strange I am seeing the CPU tap out at 1 core when testing with this data set, testing with a smaller dataset or with python I use multiple cores… must just be a red haring.

Getting occasional ruby socket timeouts from thrift… need to look into that.

More Debate, More Flame, More Choosing the correct tool for the job

2010-03-29 Matthew Yonkovit

You have to love all the debating going on over NOSQL -vs- SQL don’t you? With my UC session on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current with the mood of the community so i can make my talk more relevant. Today I was catching up on reading a few blogs posts and I thought I would pass along these two: Pro SQL and Pro NoSQL … these represent the two very different views on this subject. (Note I think there are misleading facts and figures in these that should be flushed out more, but they are a good sample of what I am talking about). Sure lots of people have posted on this and even talked on it ( I am sure you have all seen Brian’s NOSQL -vs- MySQL presentation from open sql camp last year). You see there is a huge angery bitter flame war over who is right and who is wrong. People have very strong opinions on whether SQL or NOSQL is the anti-christ. We should organize a debate at some time. So who is right? My opinion is no one is.

The fact of is if a solution meets your needs and it works it is not wrong (it may have flaws or risks to different degrees). In the case of an RDBMS -vs- NOSQL, for some applications one is better then others. The issue I think we all run into is not really the merit of NOSQL -vs- a traditional RDBMS its the willingness to accept alternative views. Too many shops out in the world are all about the new hotness and not about what’s best for their application or organization. While other people would rather die then allow there database to be taken away from them. For some apps, durability is not a big deal for others it is. Everyone has different requirements. Just because Digg or Twitter or Rackspace is doing NOSQL and it works for them does not mean you have to use it, or that it will even work for you. In fact, if you leap without thinking you may in fact hurt yourself more then solve your problems. Every situation is unique and before you jump head first into one solution or another take a breath and analyse the situation. Ask questions like : Why are we thinking about NOSQL? Is just because of HA ( hey RDBMS’s can handle that! ), is it to replace sharding? Is it to do something else? … Ask yourself about the work you need to do: do you need to do complex joins? How much data will your really have? What sort of workload do you have? Really define your goal, then research and test solutions. I am sure that the big names using Cassandra or Hbase did not read a blog post somewhere and start converting everything that day, and you should not either.
Read the rest of this entry »

New Benchmark I am working on that tests MYSQL -vs- NOSQL

2010-03-29 Matthew Yonkovit

I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled “Choosing the Right Tools for the Job, SQL or NOSQL”. While this talk is NOT supposed to be a deep dive into the good, bad, and ugly of these solutions, rather a way to discuss potential use cases for various solutions and where they may make a lot of sense, being me I still felt a need to at least do some minor benchmarking of these solutions. The series of posts I wrote last year over on mysqlperformanceblog.com comparing Tokyo Tyrant to both MySQL and Memcached was fairly popular. In fact the initial set of benchmark scripts I used for that series actually has been put to good use since then testing out things like a pair gear6 appliances, memcachedb, new memcached versions, and various memcached API’s.

When I started really digging into some of the other popular nosql solutions to expand my benchmarks it became apparent that most of these tools have fairly well defined API’s for Ruby, however in general the API’s for perl in some cases may not exist at all or are rather immature at this point. So I decided to rewrite my initial benchmark suite in Perl. With the help of my co-presenter for this talk ( Yves ) we are writing a tool that will hopefully be able to test the same basic tests against a wide variety of solutions. Currently I have tests written for Tyrant, Memcached, Cassandra, and MySQL. We will be expanding these tests to include Redis and MongoDB for sure (Maybe NDB) … beyond that I am not 100% sure. The challenge is going to be writing code that not only tests basic features, but also can test the advanced features of these solutions. After all a simple PK lookup can be done on all of these solutions, but that’s not necessarily the bread and butter of a solution like MongoDB or even Cassandra. Its the extra features that make these more compelling. We will be releasing the code when its ready.

I have not started my more exhaustive benchmarks yet… as I am still writing parts of the benchmark, but I have been running a few benchmarks. I generally hate publishing or mentioning results until I have taken the time to analyse them and ensure I did not miss anything, but what the hell. In a very short read only test, using PK based lookups to compare Innodb -vs- cassandra -vs- memcached ( a really small data set that should easily fit into memory on both on my laptop **single node **) I end up averaging ~1.2K reads per second from Cassandra, ~ 4K reads per second from Innodb, and ~ 17K reads per second in memcached. Now as I setup more benchmarks I will test multi-node performance, tune the configs for the workload, etc… but it is interesting to see the early performance difference.

More later.

A few key Tokyo Cabinet Notes

2010-03-26 Matthew Yonkovit

I wanted to publish a few interesting gotcha’s , facts, and settings people who use or want to use Tokyo Cabinet/Tyrant should know.

A quick overview, Tokyo Tyrant is the network daemon that sits ontop of Tokyo Cabinet. This means that in order to access cabinet from another server you have to access it though Tyrant. In the context of this post consider when I say Tokyo to mean the entire stack.

#1. Tokyo Cabinet allows for a single write thread. Multiple processes can try and write through tyrant but they will wait. In order to get around this limitation you need to shard your data. Using something like a memcached api ontop of a hash table is one effective way to do this.

#2. Tokyo is not durable. This means in the event of system crash you will lose data. You can call a sync process to sync data to disk, but this locks the writer process. Your best bet is to use replication to ensure you have a copy of the data and backup often.

#3. Settings for Tokyo Cabinet Files can be set via Tokyo Tyrant by adding the settings after the cabinet file: i.e.

/var/lib/tokyo/data.tch#BNUM=20000#xmsiz=10485760

Some of these settings only take place on file creation or on optimize so make sure you check the documentation.

#4. By Default there is a limit of 2GB per file to Cabinet files, this can be worked around by setting the #opt setting for your table type. For instance #opt=HDBTLARGE enables large files for the hash table. This setting takes place on creation or when you optimize. You will corrupt your file if you hit 2GB without this setting. If you experience this, your best bet is to restore from a backup that is < 2GB and switch the large file flag. (Note if I am correct you can only change the file to support large tables by using the cabinet mgr tools, i.e. running tchmgr -tl cabinet.tch against an offline file )

#5. Run optimize on a regular basis, I have seen files shrink by as much as 90% from running optimize.

* To run optimize on a table from tyrant you can run tcrmgr optimize -port xxx localhost ( This will lock writes )
* To run optimize a table from the cabinet command use the mgr for the correct table type ( i.e. tchmgr for the has table ).

#6. Increase the number of Tyrant threads from the default 8 if your having issues with refused connections. This is done on the command line when starting tyrant: ttserver -thnum 16

#7. Log your Tyrant errors to a log file by using the -log flag when starting Tyrant. By default just setting the log will also log info/warning messages, disable this by setting the -le flag which tells tyrant to only log errors.

#8. If your using a cabinet “table” database make sure you build the indexes you need otherwise your probably going to get rather slow performance.

#9. In terms of performance the BNUM setting typically has the largest impact on performance. According to the docs “specifies the number of elements of the bucket array”. Every table type is a bit different, so check the docs for the exact settings.

#10. For hash tables setting xmsiz can make a huge difference. This defines the memory allocated to mapping objects.

Huge Data -vs- The Database, how the industry is adapting in the face of the data explosion

2009-07-10 Matthew Yonkovit

Note This Article was actually written back in May after the UC at the request of Linux Magazine, through a series of events It went unpublished. Between then and now Jeremy ended up doing a great job covering most of the topics, so in the end it was unneeded. Now I had this completed article and thought, what should I do with it? In the I decided to publish them here. Also note I did update a few items.

As more companies move to MySQL and the demands for data increase, we push the bounds of the database further. The challenges large Web properties (who have pioneered many of the large MySQL deployments) faced when they stored 50GB of data and had 5,000 users were nothing like the challenges of storing 500GB of data supporting 100,000 users. Today, as we are seeing more and more 10+TB-sized datasets being used and accessed by millions of users, the same properties are again forced to think of new ways to maintain the performance, ease of use, and freedom that using MySQL has afforded them in the past. They have had to adapt and overcome these challenges to survive.

Solutions that work on one-sized environments present new challenges in others. Engineers are considering all their options. These options include moving data to non-relational solutions or even caching large chunks of their data in Memcached. Additionally, many talented engineers have also looked to database designs of the past for clues, resurrecting older database design methodologies like sharding to help keep things moving forward. Some are finding new uses for old technologies like replication by building complex master-master or massive read-write splitting setups to get the job done. So…what is the problem, and what is being done about it?

Most MySQL customers use either MyISAM or Innodb as their storage engine. They originated from those glorious days of yesteryear when we never thought we would need more then a 32-bit machine, an SMP machine meant two CPU’s, and 64MB of memory meant you had a powerhouse. The Innodb storage engine was written way back in the mid 90’s. It’s a beautiful piece of coding that has really stood up over time. However, this meant some optimizations were made based on how to get good performance out of a single CPU server with 128M of Ram and a database that was only a few GB in size. I am not saying these engines have remained unchanged; on the contrary, they have changed dramatically over the years. But while new releases have helped improve performance on larger boxes, there are many places where old code has really hindered performance.

Read the rest of this entry »

Filtering by table is now possible with WaffleGrid

2009-07-09 Yves Trudeau

Since I have been a home recently, I put some time correcting bugs in WaffleGrid and adding new features. Thanks to gdb, I have been able to understand a silly bug that was affecting WaffleGrid with sysbench but, weird enough, not with dbt2. Everything is in the way connections are established. I will blog more about that soon.

Regarding the new features, it is now possible to choose which tables you want to push to memcached. For that purpose, two new parameters have been introduce:

innodb_memcached_table_filter_enable = 0 | 1   (default to 0)

to enable the filtering and

innodb_memcached_table_list = db1/table1,db2/table2

to list the tables. This feature is filtering based on the space id so, innodb_file_per_table has to be set. Right now, the association table <-> space_id is done only at startup so, the table has to exist. Also, since an alter table change the space_id… you need to restart MySQL to restore the filtering after an alter table. Eventually, it will be more dynamic. Here are some tests done with innodb_memcached_enable = 1, caching around the disk IO, sbtest is in the filter list while sbtest_non_waffle is not. I really like mode 1, it works with the normal memcached and set to memcached are handled by the IO write threads of InnoDB, so they happened in the background. Both tables have exactly the same structure and content (100k rows, sysbench oltp test).

mysql> select avg(length(pad)) from sbtest;select avg(length(pad)) from sbtest_non_waffle;
+------------------+
| avg(length(pad)) |
+------------------+
|          50.0000 |
+------------------+
1 row in set (0.31 sec)

+------------------+
| avg(length(pad)) |
+------------------+
|          50.0000 |
+------------------+
1 row in set (0.98 sec)

Not bad, one third of the time and no disk IO, the data was already in memcached. For the test, the buffer pool is at 8M (I tweaked 5.4 to lower the minimum). Data length and Index length for both table is approximately 24 MB. Then, just to prove my point about altering tables:

mysql> alter table sbtest engine=innodb;
Query OK, 100000 rows affected (8.50 sec)
Records: 100000  Duplicates: 0  Warnings: 0

mysql> select avg(length(pad)) from sbtest;select avg(length(pad)) from sbtest_non_waffle;
+------------------+
| avg(length(pad)) |
+------------------+
|          50.0000 |
+------------------+
1 row in set (0.92 sec)

+------------------+
| avg(length(pad)) |
+------------------+
|          50.0000 |
+------------------+
1 row in set (0.96 sec)

As you can see, the timings are now similar. If you want to play with WaffleGrid, grab the current code from launchpad at lp:~y-trudeau/wafflegrid/cream-5.4-tree. It is a full tree so… be patient. Use the build scripts showing memc in their name. For more information, you can also go to www.wafflegrid.com.

Remember, the project is still fairly young, don’t put any valuable data on WaffleGrid yet. Coming next, apart from making the table filtering more dynamic, I would like to add an exclusion list instead of just the current inclusion list. Of course, we also need to test WaffleGrid much more and for that we need help. If you test and find bugs, please use Launchpad at https://code.launchpad.net/wafflegrid to report the bugs you found.

You know you have really screwed up when …

2009-06-18 Matthew Yonkovit

You crash MySQL/Innodb and you trace the error to a function that says:

“This function checks the consistency of an index page when we do not
know the index. This is also resilient so that this should never crash
even if the page is total garbage. ”

Oops… I guess its not that resilient after all:)

Waffle Grid Like Features in EnterpriseDB

2009-06-18 Matthew Yonkovit

Yves pointed this out to me:

http://www.internetnews.com/software/article.php/3825426/

“Moving beyond just providing Oracle compatibility, EnterpriseDB is adding new scalability features to their database. One of them is a feature called ‘”Infinite Cache,” which is based on the open source memcached application.

Mlodgenski commented that distributed memory caches are common, though management from a system management and a database perspective is often difficult.

“What we did is we took the distributed memory cache that was sitting in front of the database and put it behind the database,” Mlodgenski said. “This allows a simple SQL (define) interface for developers that allows access to the distributed memory cache and now the system management overhead is handed by the database itself.”

Mlodgenski added that Infinite Cache leads to a 10x to 20x performance gain for read application loads. ”

You can read more here:

http://www.enterprisedb.com/docs/en/8.3R2/perf/Performance_Guide-02.htm

and here:

http://www.enterprisedb.com/products/postgres_plus_as/overview.do

I guess it justifies that Waffle is a good idea.

Testing Waffle

2009-06-11 Matthew Yonkovit

So Ronald Bradford has been getting some EC2 AMI’s setup for Waffle the last couple of days. Really he is the first person to work with us on testing out the 0.5 cream release. It’s funny but when you are heads down in the code you tend to overlook some of the easy and obvious bugs. Just by having him walk through the install we have opened up several new bugs and come up with some new Idea’s on how to make things easier in the future. I hope by early next week we can have some of annoying bugs fixed and code pushed to launchpad. When that happens I would encourage others to give it a whirl and help us make Waffle Grid better performing, more user friendly, and more stable.

You can check out some of the bugs here.

Thanks to Ronald for his help on this!