Note This Article was actually written back in May after the UC at the request of Linux Magazine, through a series of events It went unpublished. Between then and now Jeremy ended up doing a great job covering most of the topics, so in the end it was unneeded. Now I had this completed article and thought, what should I do with it? In the I decided to publish them here. Also note I did update a few items.
As more companies move to MySQL and the demands for data increase, we push the bounds of the database further. The challenges large Web properties (who have pioneered many of the large MySQL deployments) faced when they stored 50GB of data and had 5,000 users were nothing like the challenges of storing 500GB of data supporting 100,000 users. Today, as we are seeing more and more 10+TB-sized datasets being used and accessed by millions of users, the same properties are again forced to think of new ways to maintain the performance, ease of use, and freedom that using MySQL has afforded them in the past. They have had to adapt and overcome these challenges to survive.
Solutions that work on one-sized environments present new challenges in others. Engineers are considering all their options. These options include moving data to non-relational solutions or even caching large chunks of their data in Memcached. Additionally, many talented engineers have also looked to database designs of the past for clues, resurrecting older database design methodologies like sharding to help keep things moving forward. Some are finding new uses for old technologies like replication by building complex master-master or massive read-write splitting setups to get the job done. So…what is the problem, and what is being done about it?
Most MySQL customers use either MyISAM or Innodb as their storage engine. They originated from those glorious days of yesteryear when we never thought we would need more then a 32-bit machine, an SMP machine meant two CPU’s, and 64MB of memory meant you had a powerhouse. The Innodb storage engine was written way back in the mid 90’s. It’s a beautiful piece of coding that has really stood up over time. However, this meant some optimizations were made based on how to get good performance out of a single CPU server with 128M of Ram and a database that was only a few GB in size. I am not saying these engines have remained unchanged; on the contrary, they have changed dramatically over the years. But while new releases have helped improve performance on larger boxes, there are many places where old code has really hindered performance.
