<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Big DBA Head</title>
	<atom:link href="http://www.bigdbahead.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.bigdbahead.com</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Sat, 22 Oct 2011 21:23:16 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Speaking @ Percona Live in London Next Week!</title>
		<link>http://www.bigdbahead.com/?p=752</link>
		<comments>http://www.bigdbahead.com/?p=752#comments</comments>
		<pubDate>Sat, 22 Oct 2011 15:47:58 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[Matt]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[NOSQL]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=752</guid>
		<description><![CDATA[A quick note, I am speaking at Percona live in London next week&#8230; its should be a rip roaring time. I have two topics I am speaking on. The first is on building a MySQL Data Access Layer with Ruby &#8230; <a href="http://www.bigdbahead.com/?p=752">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>A quick note, I am speaking at Percona live in London next week&#8230;  its should be a rip roaring time.  I have two topics I am speaking on.</p>
<p>The first is on building a MySQL Data Access Layer with Ruby and Sinatra.  While this may seem a bit odd, its actually very cool and useful.  With < 100 lines of code you can build some pretty awesome web services to expose your data to the outside world or even better yet build web services to replace developer maintained ORM&#8217;s!  Additionally if you use this as part of a broader framework you can achieve all kinds of coolness.  In fact all the code that I will show was written in < 2 hours, and it only took that long because I was trying to make it flow into the presentation.    </p>
<p>The second presentation, How I learned to stop worrying and Love Big Data should be fun.  I promise the following:  An interesting take on Big data, Dr Strangelove, Simpson&#8217;s Humor, Daleks, wacky signs, Darth Vader, and a Monty Python reference&#8230;  oh yeah this should be fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=752</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Innodb Compression: When More is Less</title>
		<link>http://www.bigdbahead.com/?p=749</link>
		<comments>http://www.bigdbahead.com/?p=749#comments</comments>
		<pubDate>Sun, 22 May 2011 18:09:49 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[benchmark]]></category>
		<category><![CDATA[innodb internals]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=749</guid>
		<description><![CDATA[So Vadim posted on the MySQL Performance Blog about poor benchmarks when running innodb compressed pages.  I ran some tests a few weeks ago and did not see the same results as him and checked into my previous tests and &#8230; <a href="http://www.bigdbahead.com/?p=749">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>So Vadim posted on the <a href="http://www.mysqlperformanceblog.com">MySQL Performance Blog</a> about <a href="http://www.mysqlperformanceblog.com/2011/05/20/innodb-compression-woes/">poor benchmarks when running innodb compressed pages</a>.  I ran some tests a few weeks ago and did not see the same results as him and checked into my previous tests and compared them to his #&#8217;s.  In a round about way verifying his thoughts on Mutex contention I found that increasing the BP sized with compressed data decreases the transactional throughput. The test was run with an uncomressed data set size of 6GB, 3.1GB compressed read-only.</p>
<p><a href="http://www.bigdbahead.com/wp-content/uploads/2011/05/compression_html_m66222a14.jpg"><img class="alignnone size-full wp-image-750" title="compression_html_m66222a14" src="http://www.bigdbahead.com/wp-content/uploads/2011/05/compression_html_m66222a14.jpg" alt="" width="700" height="526" /></a></p>
<table border="0" cellspacing="0" frame="VOID" rules="NONE">
<tbody>
<tr>
<td width="120" height="17" align="LEFT"></td>
<td width="86" align="LEFT">TPS</td>
</tr>
<tr>
<td height="17" align="LEFT">2G, NOZIP</td>
<td align="RIGHT">3217.19</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, NOZIP</td>
<td align="RIGHT">4479.81</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, NOZIP, 16BP</td>
<td align="RIGHT">4424.3</td>
</tr>
<tr>
<td height="17" align="LEFT">1G,ZIP</td>
<td align="RIGHT">1120.3</td>
</tr>
<tr>
<td height="17" align="LEFT">2G,ZIP</td>
<td align="RIGHT">1181.8</td>
</tr>
<tr>
<td height="17" align="LEFT">4G,ZIP</td>
<td align="RIGHT">38</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, ZIP</td>
<td align="RIGHT">33.6</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, ZIP, 4BP&#8217;s</td>
<td align="RIGHT">226</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, ZIP, 8 BP&#8217;s</td>
<td align="RIGHT">544.7</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, ZIP, 12BP&#8217;s</td>
<td align="RIGHT">3009.79</td>
</tr>
<tr>
<td height="17" align="LEFT">8G, ZIP, 16BP&#8217;s</td>
<td align="RIGHT">3026.1</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>You can see that adding memory to the innodb buffer pool slows things down.  Looking at the Innodb status, you can see things getting locked up.  What interesting though is you can mitigate alot of this simply by making use of multiple buffer pools.  </p>
<p>Here is where it waiting:<br />
<code><br />
----------<br />
SEMAPHORES<br />
----------<br />
OS WAIT ARRAY INFO: reservation count 529411, signal count 133604<br />
--Thread 140360422000384 has waited at /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0buf.c line 3483 for 0.0000 seconds the semaphore:<br />
S-lock on RW-latch at 0x412fb68 '&amp;buf_pool-&gt;page_hash_latch'<br />
a writer (thread id 140360421799680) has reserved it in mode  exclusive<br />
number of readers 0, waiters flag 1, lock_word: 0<br />
Last time read locked in file /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0buf.c line 3483<br />
Last time write locked in file /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0lru.c line 1626<br />
--Thread 140355466110720 has waited at /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0lru.c line 813 for 0.0000 seconds the semaphore:<br />
Mutex at 0x412fb28 '&amp;buf_pool-&gt;LRU_list_mutex', lock var 1<br />
waiters flag 1<br />
--Thread 140355465910016 has waited at /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0buf.c line 4398 for 0.0000 seconds the semaphore:<br />
Mutex at 0x412fb28 '&amp;buf_pool-&gt;LRU_list_mutex', lock var 1<br />
waiters flag 1<br />
--Thread 140360422201088 has waited at /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0buf.c line 4398 for 0.0000 seconds the semaphore:<br />
Mutex at 0x412fb28 '&amp;buf_pool-&gt;LRU_list_mutex', lock var 1<br />
waiters flag 1<br />
--Thread 140355466311424 has waited at /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0buf.c line 4398 for 0.0000 seconds the semaphore:<br />
Mutex at 0x412fb28 '&amp;buf_pool-&gt;LRU_list_mutex', lock var 1<br />
waiters flag 1<br />
--Thread 140355466512128 has waited at /var/lib/buildbot/slaves/percona-server-51-12/DEB_Ubuntu_maverick_amd64/work/Percona-Server-5.5.10-rc20.1/storage/innobase/buf/buf0buf.c line 4398 for 0.0000 seconds the semaphore:<br />
Mutex at 0x412fb28 '&amp;buf_pool-&gt;LRU_list_mutex', lock var 1<br />
waiters flag 1<br />
Mutex spin waits 455865, rounds 16856962, OS waits 524701<br />
RW-shared spins 29949, rounds 192489, OS waits 3591<br />
RW-excl spins 499, rounds 26672, OS waits 117<br />
Spin rounds per wait: 36.98 mutex, 6.43 RW-shared, 53.45 RW-excl</code></p>
<p>Long story short, if your using compression + Innodb you may want to look into using multiple buffer pools until this is fixed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=749</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL Bootcamp at Collaborate 2011 pt 2</title>
		<link>http://www.bigdbahead.com/?p=745</link>
		<comments>http://www.bigdbahead.com/?p=745#comments</comments>
		<pubDate>Thu, 07 Oct 2010 15:23:44 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[Matt]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=745</guid>
		<description><![CDATA[Hi All, I am going through some of the sessions for IOUG&#8217;s Collaborate 2011 Conference and trying to fill in slots for the bootcamp, and while we have some great sessions we could use a few more sessions. Specifically I &#8230; <a href="http://www.bigdbahead.com/?p=745">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Hi All,</p>
<p>I am going through some of the sessions for IOUG&#8217;s Collaborate 2011 Conference and trying to fill in slots for the bootcamp, and while we have some great sessions we could use a few more sessions.  Specifically I would love to get a couple of sessions on a few the following topics:</p>
<ul>
<li>InnoDB in General</li>
<li>InnoDB Internals &amp; scalability</li>
<li>General Overview of available Storage Engines (Pros/Cons)</li>
<li>MySQL options for Very Large databases ( ala Partitioning, Sharding )</li>
<li>MySQL Monitoring Options</li>
<li>NDB Cluster</li>
</ul>
<p>These are just a few suggestions, please feel free to submit any topic thats related to MySQL&#8230;  even if its not a fit for the bootcamp, there is a MySQL track that it would fit into.  IOUG extended the MySQL Deadline until next Monday for us, so let&#8217;s get some more papers in!  You can submit <a href="http://www.ioug.org/callforspeakers">here</a> &#8230;</p>
<p>Now I know  many folks are torn because the MySQL UC falls during this same time&#8230;  but there should be enough room for both conferences.  I know many DBA&#8217;s and Developers who are torn when it comes to which conferences to attend as they are forced into supporting multiple database technologies (Oracle, SQL Server, DB2, MySQL, etc) .  Collaborate gives these people a place to go and learn about more then Just MySQL.  Recently I have seen many &#8220;Classic Oracle DBA&#8217;s&#8221;  asking more questions and trying to Learn more about MySQL.  This is one opportunity where the MySQL community has the chance to leave a lasting impression on Oracle users who have not had any MySQL experience before.</p>
<p>This is untamed territory!  Come on down to Orlando and help Oracle users understand the benefits of MySQL!</p>
<p>Matt</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=745</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Helping to Organize a MySQL Bootcamp @ IOUG&#8217;s COLLABORATE 11</title>
		<link>http://www.bigdbahead.com/?p=740</link>
		<comments>http://www.bigdbahead.com/?p=740#comments</comments>
		<pubDate>Sun, 19 Sep 2010 13:19:15 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[Matt]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=740</guid>
		<description><![CDATA[I am helping IOUG Organize a MySQL bootcamp at their Collaborate conference in Orlando. This is actually a great opportunity to reach out to a lot of Oracle talent looking for more information and training on MySQL. An IOUG Bootcamp &#8230; <a href="http://www.bigdbahead.com/?p=740">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>  I am helping IOUG Organize a MySQL bootcamp at their Collaborate conference in Orlando.  This is actually a great opportunity to reach out to a lot of Oracle talent looking for more information and training on MySQL.   An IOUG Bootcamp is compiled from several 1 hour technical sessions starting from introductory level topics on day 1 to advanced topics the final day.   The idea is this format will help those not familiar with MySQL, get a crash course in MySQL while also providing people with a wide range of targeted sessions that they can come in an out of as they see fit.  It&#8217;s like the tag line of most Carnivals, Fun for all ages&#8230;  </p>
<p>    The reason I am posting is we are going to need lots of help from you.  We are looking for speakers for the conference at the moment&#8230;  you can submit your papers here: <a href=" http://www.ioug.org/callforspeakers"> http://www.ioug.org/callforspeakers</a>.  Think of it like this, this is an opportunity to be a pioneer&#8230;  reaching people who may have never attended a MySQL Related conference,  heard the good word of MySQL, or maybe even know much about Open Source.  Hmmm, maybe I should dress up as a cowboy or pioneer for the event!   The question is will that attract or turn people away.  Maybe we should vote on it!  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=740</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Speaking in Chicago on Tuesday September 28th</title>
		<link>http://www.bigdbahead.com/?p=737</link>
		<comments>http://www.bigdbahead.com/?p=737#comments</comments>
		<pubDate>Mon, 13 Sep 2010 20:02:24 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[5 minute dba]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=737</guid>
		<description><![CDATA[I wanted to drop a quick note and let everyone know I am going to be speaking at an IOUG event on 9/28/2010 in Downtown Chicago. I will be targeting DBA&#8217;s, Developers, and users who want to know more about &#8230; <a href="http://www.bigdbahead.com/?p=737">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I wanted to drop a quick note and let everyone know I am going to be speaking at an IOUG event  on 9/28/2010 in Downtown Chicago.  I will be targeting DBA&#8217;s, Developers, and users who want to know more about MySQL but do not have the time to devote a ton of time to learning everything little thing.  I will be covering DBA 101 tasks in my 5 minute DBA talk, Developer &#038; DBA common mistakes, common high availability architectures, and talking about the various versions, forks, and patches of MySQL that are floating around in the community.  </p>
<p>You can register here:<br />
<a href="http://www.ioug.org/Events/IOUGWelcomesMySQL/tabid/164/Default.aspx">http://www.ioug.org/Events/IOUGWelcomesMySQL/tabid/164/Default.aspx</a></p>
<p>I hope to see you there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=737</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Great to see everyone at the UC</title>
		<link>http://www.bigdbahead.com/?p=731</link>
		<comments>http://www.bigdbahead.com/?p=731#comments</comments>
		<pubDate>Thu, 15 Apr 2010 18:23:56 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[5 minute dba]]></category>
		<category><![CDATA[Matt]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=731</guid>
		<description><![CDATA[It was awesome to see everyone at the 2010 mysql UC. Sorry if I did not get a chance to chat with everyone, time just flew by! I had great turn out for my two sessions and had a lot &#8230; <a href="http://www.bigdbahead.com/?p=731">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>It was awesome to see everyone at the 2010 mysql UC.  Sorry if I did not get a chance to chat with everyone, time just flew by!  I had great turn out for my two sessions and had a lot of great conversations with people.  If people are looking for my slides they are posted on the User Conference Website here:  <a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/75377">http://en.oreilly.com/mysql2010/public/schedule/speaker/75377</a>  ..  Thanks Everyone!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=731</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick Cassandra Notes Part 1</title>
		<link>http://www.bigdbahead.com/?p=728</link>
		<comments>http://www.bigdbahead.com/?p=728#comments</comments>
		<pubDate>Thu, 01 Apr 2010 02:55:47 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[NOSQL]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=728</guid>
		<description><![CDATA[Trying to use the Ruby bindings to do benchmarking, so far things are going rather slow compared to other benchmarks in Python. This could be the size of the data I am testing with as well. Still looking into things &#8230; <a href="http://www.bigdbahead.com/?p=728">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Trying to use the Ruby bindings to do benchmarking, so far things are going rather slow compared to other benchmarks in Python. This could be the size of the data I am testing with as well.  Still looking into things however so far loading 1 Million rows into cassandra takes ~4.6GB, while loading the exact same data into mysql takes ~950M.  A 4.5x increase in storage is a lot, not sure if that will maintain as I get more data into the system, or if there is just a lot more overhead at the start.  Will load 2M and 3M rows to see.  </p>
<p>Also you have to &#8220;warm&#8221; cassandra like other databases&#8230;  after loading my 1M rows, I ran some quick tests.  125 ops/s, 288 ops/s, 311 ops/s, 1530 ops/s, 1872 ops/s, 1868 ops/s&#8230;  </p>
<p>It looks like I am really bottlenecked by the thrift calls in ruby ( per profile )&#8230;  strange I am seeing the CPU tap out at 1 core when testing with this data set, testing with a smaller dataset or with python I use multiple cores&#8230;  must just be a red haring. </p>
<p>Getting occasional ruby socket timeouts from thrift&#8230;  need to look into that.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=728</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More Debate, More Flame, More Choosing the correct tool for the job</title>
		<link>http://www.bigdbahead.com/?p=714</link>
		<comments>http://www.bigdbahead.com/?p=714#comments</comments>
		<pubDate>Mon, 29 Mar 2010 20:17:17 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[benchmark]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Matt]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[NOSQL]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=714</guid>
		<description><![CDATA[You have to love all the debating going on over NOSQL -vs- SQL don&#8217;t you? With my UC session on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current &#8230; <a href="http://www.bigdbahead.com/?p=714">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>  You have to love all the debating going on over NOSQL -vs- SQL don&#8217;t you?  With my <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12685">UC session</a> on choosing the right data storage tools ( does this sound better then SQL-vs-NoSQL?) I have been trying to stay current with the mood of the community so i can make my talk more relevant.  Today I was catching up on reading a few blogs posts and I thought I would pass along these two:<a href="http://www.yafla.com/dforbes/The_Impact_of_SSDs_on_Database_Performance_and_the_Performance_Paradox_of_Data_Explodification/"> Pro SQL </a> and  <a href="http://stu.mp/2010/03/nosql-vs-rdbms-let-the-flames-begin.html"> Pro NoSQL</a> &#8230; these represent the two very different views on this subject. (Note I think there are misleading facts and figures in these that should be flushed out more, but they are a good sample of what I am talking about).    Sure lots of people have posted on this and even talked on it ( I am sure you have all seen Brian&#8217;s NOSQL -vs- MySQL presentation from open sql camp last year).  You see there is a huge angery bitter flame war over who is right and who is wrong. People have very strong opinions on whether SQL or NOSQL is the anti-christ. We should organize a debate at some time.  So who is right?  My opinion is no one is.  </p>
<p>  The fact of is if a solution meets your needs and it works it is not wrong (it may have flaws or risks to different degrees).  In the case of an RDBMS -vs- NOSQL, for some applications one is better then others.  The issue I think we all run into is not really the merit of NOSQL -vs- a traditional RDBMS its the willingness to accept alternative views.  Too many shops out in the world are all about the new hotness and not about what&#8217;s best for their application or organization.  While other people would rather die then allow there database to be taken away from them.  For some apps, durability is not a big deal for others it is. Everyone has different requirements.  Just because Digg or Twitter or Rackspace is doing NOSQL and it works for them does not mean you have to use it, or that it will even work for you.  In fact, if you leap without thinking you may in fact hurt yourself more then solve your problems.  Every situation is unique and before you jump head first into one solution or another take a breath and analyse the situation.  Ask questions like : Why are we thinking about NOSQL? Is just because of HA ( hey RDBMS&#8217;s can handle that! ), is it to replace sharding?  Is it to do something else? …  Ask yourself about the work you need to do: do you need to do complex joins?  How much data will your really have?  What sort of workload do you have?  Really define your goal, then research and test solutions.  I am sure that the big names using Cassandra or Hbase did not read a blog post somewhere and start converting everything that day, and you should not either.<br />
<span id="more-714"></span><br />
  Also Be careful of all the analysis,  all the opinions, benchmarks, etc you see on the web on the topic.  These are specific to a certain workload or user.  Take Joe&#8217;s post (pro nosql from above), he says “Anyone out there running an EC2 large instance with a RDBMS on it that’s doing 1,800 reads/second? I’ve got a Cassandra node that was getting hammered with a load of 6 serving that much traffic without falling over..”  taken out of context I could say, well hell my laptop this morning got 1200 reads/second on Cassandra and 4,000 reads/second with innodb.  Does that mean MySQL is 4x better then cassandra?  Well in a certain workload, under certain conditions sure&#8230; but I can write another benchmark that shows the opposite.   By the way yes I have gotten well more then 1800 reads/second on an ec2 large instance&#8230;. but the workload is probably so different it&#8217;s a worthless comparison. </p>
<p>  Facts and figures can be used to sway opinions, especially when variables are unknown.  Let me show you what I mean.  One of my colleagues was getting 55K read/write operations per second on a new server the other day.  Joe  ( Joe I am not picking on you directly, really ) posted he gets 1800/s on a large ec2 server.  That **could** mean that Cassandra would need 31 large ec2 instances to match the power of that one server.  That&#8217;s a cost of  ( $2978.40 per aws large instance) of $92,330 per year.  It&#8217;s over 3x the cost of the particular server that achieved 55K ops.  Who would want to pay 3x more for  the same performance right?  This Proves SQL is awesome and NOSQL Sucks right?  The answer is NO.  Again the workloads are probably so different one may lend itself better to SQL.  What if Joe has 1TB of data and I only had 100G, well that changes the equation and we would have to adjust to account for that.  In this case with 31 servers if I could process 31TB of data at that consistent speed, then it maybe worth it, depending on how long it takes a single RDBMS to deliver results over 31TB.  </p>
<p>  I guess I am trying to say, make a decision based on your own tests and your own workload.  There is nothing wrong with you considering either option as they have their merits and their place in the world:)  There certainly is nothing wrong with listening to all of the banter about our experiences and our opinions.  But even if really smart people tell you all kinds of reasons why NOSQL is better then a RDBMS, or other Equally Smart people tell you why an RDMS is better then a NOSQL Solution, evaluate for yourself and make an informed decision.  A lot of these smart people are looking at the problem from there own unique experience.  If someone had a bad experience with MySQL and did not have a good DBA, they may view MYSQL in a very negative light.  Similarly if you have optimized, developed, and improved MySQL over the years you may view NOSQL solutions as foreign and filled with risk.  Also remember sometimes really smart people sometimes do really dumb things ( I could talk about all the really smart people I know, and the rather non-common sense approaches they have tried because they are so close to a problem).  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=714</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Benchmark I am working on that tests MYSQL -vs- NOSQL</title>
		<link>http://www.bigdbahead.com/?p=702</link>
		<comments>http://www.bigdbahead.com/?p=702#comments</comments>
		<pubDate>Mon, 29 Mar 2010 14:57:10 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[benchmark]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Matt]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[NOSQL]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Tokyo]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=702</guid>
		<description><![CDATA[I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled &#8220;Choosing the Right Tools for the Job, &#8230; <a href="http://www.bigdbahead.com/?p=702">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>  I am giving a talk in a couple of weeks at the 2010 MySQL User Conference that will touch on use cases for NOSQL tools -vs- More relational tools, the talk is entitled <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12685">&#8220;Choosing the Right Tools for the Job, SQL or NOSQL&#8221;</a>.  While this talk is NOT supposed to be a deep dive into the good, bad, and ugly of these solutions, rather a way to discuss potential use cases for various solutions and where they may make a lot of sense, being me I still felt a need to at least do some minor benchmarking of these solutions. The series of posts I wrote last year over on  <a href="http://www.mysqlperformanceblog.com/category/nosql/">mysqlperformanceblog.com</a> comparing Tokyo Tyrant to both MySQL and Memcached was fairly popular.  In fact the initial set of benchmark scripts I used for that series actually has been put to good use since then testing out things like a pair gear6 appliances, memcachedb, new memcached versions, and various memcached API&#8217;s.
</p>
<p>
   When I started really digging into some of the other popular nosql solutions to expand my benchmarks it became apparent that most of these tools have fairly well defined API&#8217;s for Ruby, however in general the API&#8217;s for perl in some cases may not exist at all or are rather immature at this point.  So I decided to rewrite my initial benchmark suite in Perl.  With the help of my co-presenter for this talk ( Yves ) we are writing a tool that will hopefully be able to test the same basic tests against a wide variety of solutions.  Currently I have tests written for Tyrant, Memcached, Cassandra, and MySQL.  We will be expanding these tests to include Redis and MongoDB for sure (Maybe NDB) &#8230; beyond that I am not 100% sure.  The challenge is going to be writing code that not only tests basic features, but also can test the advanced features of these solutions.  After all a simple PK lookup can be done on all of these solutions, but that&#8217;s not necessarily the bread and butter of a solution like MongoDB or even Cassandra.  Its the extra features that make these more compelling.  We will be releasing the code when its ready.
</p>
<p>
   I have not started my more exhaustive benchmarks yet&#8230; as I am still writing parts of the benchmark, but I have been running a few benchmarks.  I generally hate publishing or mentioning results until I have taken the time to analyse them and ensure I did not miss anything, but what the hell.  In a very short read only test, using PK based lookups to compare Innodb -vs- cassandra -vs- memcached ( a really small data set that should easily fit into memory on both on my laptop **single node **) I end up averaging ~1.2K reads per second from Cassandra, ~ 4K reads per second from Innodb, and ~  17K reads per second in memcached.  Now as I setup more benchmarks I will test multi-node performance, tune the configs for the workload, etc&#8230;  but it is interesting to see the early performance difference.
</p>
<p>More later.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=702</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A few key Tokyo Cabinet Notes</title>
		<link>http://www.bigdbahead.com/?p=700</link>
		<comments>http://www.bigdbahead.com/?p=700#comments</comments>
		<pubDate>Fri, 26 Mar 2010 14:32:59 +0000</pubDate>
		<dc:creator>Matthew Yonkovit</dc:creator>
				<category><![CDATA[5 minute dba]]></category>
		<category><![CDATA[NOSQL]]></category>
		<category><![CDATA[Tokyo]]></category>

		<guid isPermaLink="false">http://www.bigdbahead.com/?p=700</guid>
		<description><![CDATA[I wanted to publish a few interesting gotcha&#8217;s , facts, and settings people who use or want to use Tokyo Cabinet/Tyrant should know. A quick overview, Tokyo Tyrant is the network daemon that sits ontop of Tokyo Cabinet. This means &#8230; <a href="http://www.bigdbahead.com/?p=700">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I wanted to publish a few interesting gotcha&#8217;s , facts, and settings people who use or want to use Tokyo Cabinet/Tyrant should know.  </p>
<p>A quick overview, Tokyo Tyrant is the network daemon that sits ontop of Tokyo Cabinet.  This means that in order to access cabinet from another server you have to access it though Tyrant.  In the context of this post consider when I say Tokyo to mean the entire stack.  </p>
<p>#1. Tokyo Cabinet allows for a single write thread.  Multiple processes can try and write through tyrant but they will wait.  In order to get around this limitation you need to shard your data.  Using something like a memcached api ontop of a hash table is one effective way to do this. </p>
<p>#2. Tokyo is not durable.  This means in the event of system crash you will lose data.  You can call a sync process to sync data to disk, but this locks the writer process.  Your best bet is to use replication to ensure you have a copy of the data and backup often.  </p>
<p>#3.  Settings for Tokyo Cabinet Files can be set via Tokyo Tyrant by adding the settings after the cabinet file: i.e. </p>
<p>/var/lib/tokyo/data.tch#BNUM=20000#xmsiz=10485760</p>
<p>Some of these settings only take place on file creation or on optimize so make sure you check the documentation.  </p>
<p>#4.  By Default there is a limit of 2GB per file to Cabinet files, this can be worked around by setting the #opt setting for your table type.  For instance #opt=HDBTLARGE enables large files for the hash table.  This setting takes place on creation or when you optimize.  You will corrupt your file if you hit 2GB without this setting.  If you experience this, your best bet is to restore from a backup that is < 2GB and switch the large file flag.  (Note if I am correct you can only change the file to support large tables by using the cabinet mgr tools, i.e.  running tchmgr -tl cabinet.tch against an offline file )</p>
<p>#5.  Run optimize on a regular basis, I have seen files shrink by as much as 90% from running optimize.</p>
<p>* To run optimize on a table from tyrant you can run tcrmgr optimize -port xxx localhost  ( This will lock writes )<br />
* To run optimize a table from the cabinet command use the mgr for the correct table type ( i.e. tchmgr for the has table ).</p>
<p>#6.  Increase the number of Tyrant threads from the default 8 if your having issues with refused connections.  This is done on the command line when starting tyrant:  ttserver -thnum 16 </p>
<p>#7.  Log your Tyrant errors to a log file by using the -log flag when starting Tyrant.  By default just setting the log will also log info/warning messages, disable this by setting the -le flag which tells tyrant to only log errors.  </p>
<p>#8.  If your using a cabinet “table” database make sure you build the indexes you need otherwise your probably going to get rather slow performance.  </p>
<p>#9.  In terms of performance the BNUM setting typically has the largest impact on performance.  According to the docs “specifies the number of elements of the bucket array”.  Every table type is a bit different, so check the docs for the exact settings.  </p>
<p>#10.  For hash tables setting xmsiz can make a huge difference.  This defines the memory allocated to mapping objects.   </p>
]]></content:encoded>
			<wfw:commentRss>http://www.bigdbahead.com/?feed=rss2&#038;p=700</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
