More Debate, More Flame, More Choosing the correct tool for the job

You have to love all the debating going on over NOSQL -vs- SQL don’t you? With my UC session on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current with the mood of the community so i can make my talk more relevant. Today I was catching up on reading a few blogs posts and I thought I would pass along these two: Pro SQL and Pro NoSQL … these represent the two very different views on this subject. (Note I think there are misleading facts and figures in these that should be flushed out more, but they are a good sample of what I am talking about). Sure lots of people have posted on this and even talked on it ( I am sure you have all seen Brian’s NOSQL -vs- MySQL presentation from open sql camp last year). You see there is a huge angery bitter flame war over who is right and who is wrong. People have very strong opinions on whether SQL or NOSQL is the anti-christ. We should organize a debate at some time. So who is right? My opinion is no one is.

The fact of is if a solution meets your needs and it works it is not wrong (it may have flaws or risks to different degrees). In the case of an RDBMS -vs- NOSQL, for some applications one is better then others. The issue I think we all run into is not really the merit of NOSQL -vs- a traditional RDBMS its the willingness to accept alternative views. Too many shops out in the world are all about the new hotness and not about what’s best for their application or organization. While other people would rather die then allow there database to be taken away from them. For some apps, durability is not a big deal for others it is. Everyone has different requirements. Just because Digg or Twitter or Rackspace is doing NOSQL and it works for them does not mean you have to use it, or that it will even work for you. In fact, if you leap without thinking you may in fact hurt yourself more then solve your problems. Every situation is unique and before you jump head first into one solution or another take a breath and analyse the situation. Ask questions like : Why are we thinking about NOSQL? Is just because of HA ( hey RDBMS’s can handle that! ), is it to replace sharding? Is it to do something else? … Ask yourself about the work you need to do: do you need to do complex joins? How much data will your really have? What sort of workload do you have? Really define your goal, then research and test solutions. I am sure that the big names using Cassandra or Hbase did not read a blog post somewhere and start converting everything that day, and you should not either.

Also Be careful of all the analysis, all the opinions, benchmarks, etc you see on the web on the topic. These are specific to a certain workload or user. Take Joe’s post (pro nosql from above), he says “Anyone out there running an EC2 large instance with a RDBMS on it that’s doing 1,800 reads/second? I’ve got a Cassandra node that was getting hammered with a load of 6 serving that much traffic without falling over..” taken out of context I could say, well hell my laptop this morning got 1200 reads/second on Cassandra and 4,000 reads/second with innodb. Does that mean MySQL is 4x better then cassandra? Well in a certain workload, under certain conditions sure… but I can write another benchmark that shows the opposite. By the way yes I have gotten well more then 1800 reads/second on an ec2 large instance…. but the workload is probably so different it’s a worthless comparison.

Facts and figures can be used to sway opinions, especially when variables are unknown. Let me show you what I mean. One of my colleagues was getting 55K read/write operations per second on a new server the other day. Joe ( Joe I am not picking on you directly, really ) posted he gets 1800/s on a large ec2 server. That **could** mean that Cassandra would need 31 large ec2 instances to match the power of that one server. That’s a cost of ( $2978.40 per aws large instance) of $92,330 per year. It’s over 3x the cost of the particular server that achieved 55K ops. Who would want to pay 3x more for the same performance right? This Proves SQL is awesome and NOSQL Sucks right? The answer is NO. Again the workloads are probably so different one may lend itself better to SQL. What if Joe has 1TB of data and I only had 100G, well that changes the equation and we would have to adjust to account for that. In this case with 31 servers if I could process 31TB of data at that consistent speed, then it maybe worth it, depending on how long it takes a single RDBMS to deliver results over 31TB.

I guess I am trying to say, make a decision based on your own tests and your own workload. There is nothing wrong with you considering either option as they have their merits and their place in the world:) There certainly is nothing wrong with listening to all of the banter about our experiences and our opinions. But even if really smart people tell you all kinds of reasons why NOSQL is better then a RDBMS, or other Equally Smart people tell you why an RDMS is better then a NOSQL Solution, evaluate for yourself and make an informed decision. A lot of these smart people are looking at the problem from there own unique experience. If someone had a bad experience with MySQL and did not have a good DBA, they may view MYSQL in a very negative light. Similarly if you have optimized, developed, and improved MySQL over the years you may view NOSQL solutions as foreign and filled with risk. Also remember sometimes really smart people sometimes do really dumb things ( I could talk about all the really smart people I know, and the rather non-common sense approaches they have tried because they are so close to a problem).

This entry was posted in benchmark, linux, Matt, mysql, NOSQL, performance. Bookmark the permalink.

One Response to More Debate, More Flame, More Choosing the correct tool for the job

  1. The referenced discussion is way off track. I am sure that Digg has good reasons for migrating. But their initial post described how the migration fixed their performance problem and the real problem seems to be a lousy schema — http://mysqlha.blogspot.com/2010/03/index-only.html.

    Dennis has written about this too at http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie.

    All of this has been ignored by the Digg performance guru. Maybe that is why the discussion is nothing but flames.

    Insert ad for expert consulting here.