Big DBA Head!

Database Brain Power!

April 25th, 2008

Follow up to my Common Disk Issues

Ahh seems like a few people do take the time to read my blog:) Peter Z Commented here on my common disk performance mistakes post. He makes some great arguments, and you may want to give it a read. While he does not agree with everything I say it is interesting to see his views. Remember different folks have different experiences and a lot of times there are multiple roads on the path the performance nirvana.

Let me start off saying I wholly admit that saying “everything” is a disk issue is a dramatic exaggeration. And i did not specifically say disk, I said “The problem is always an IO problem”, more on that later. I have run into my far share of issues outside of this sphere ( network, context switching, cpu ), but I still find disk performance to be by far the most common issue effecting systems I deal with. In my opinion It is the reason memcache had to be invented, the reason for explosive growth in the amount of memory in servers, etc. If all of your disks performed even at half the speed of memory the urgent need to fit everything into memory would not be so urgent ( Now that sounds silly ). This is not a MySQL specific issue either I was dealing with disk performance issues since the mid 90’s in other database systems.

Read the rest of this entry »

April 21st, 2008

Common Performance Mistakes: Disk

 

Over the past several years whether I am working with a small company or a fortune 500 client I have seen lots of issues and mistakes made around the configuration, setup, and ongoing maintenance of disk on Unix and Linux Servers. This is not only an issue with MySQL shops, rather it can be an issue with all database setups whether it is Oracle, DB2, or Sybase. Neglecting the disk is setting yourself up for long term issues. These mistakes often force companies to throw more and more hardware at the problem… Lets look at some common mistakes around disk:

 

#1 The problem is always an IO problem, and remember spindles not capacity

 

In performance disk is everything. It makes you happy, it makes you sad. Learn to love it. Learn to hate it. Learn to understand it. Everything eventually comes back to disk. Take this example: Lets say you have a bad query that you fix with an index. Why was it slow? The easy answer : was a missing index. But really the answer is the missing index caused the query to read too much data from disk. The index merely reduced the number of ios that needed to be used to complete the query. In a perfect world the disk would be fast enough to return the query without you even realizing it was missing. But our world is not perfect. So what are some disk related performance tips?

 

Read the rest of this entry »

April 21st, 2008

Lots of Discussion on the backup plugin

The messages going back and forth on the “close sourcing” or paying to make use of some “plugins” /addons within MySQL are flying still! And in my opinion this is getting way out of hand. < DISCLAIMER >I work for Sun/MySQL as a senior consultant but claim no special knowledge or agenda here. The comments here are simply my opinions. </ DISCLAIMER> I can not help but look at all the hubbub and laugh a little bit. The vast majority of what I have read seems way off base (more of a he/she posted this, fifth hand retelling instead of from official channels). But there is a lot of FUD floating around, and I keep seeing more and more legit news sources picking up on this “massive change to licensing”, which In my opinion it really is a non-story… because there is no massive change.

 

What is the issue?

Read the rest of this entry »

March 6th, 2008

A Few Common Performance Mistakes

As I run into the same mistake over and over again I am going to throw them out here in order to serve as a warning and a learning tool to others. Some of these may seem like I am beating a dead horse because you can find examples of these on other sites, in other blogs, or even in the manual. Despite the large amounts of data on the subjects these things still pop up. This is not a complete list, and this list is not in any order… so please don’t dispare if I miss your favorite common performance gaff. As I encounter more I will post them to the common mistakes category here on bigdbahead.com. Hopefully this list will grow to include a large subset of issues and maybe one or two people will take note and fix their applications and databases.

 

#1.) MORE IS NOT ALWAYS BETTER

We often see configuration examples where folks believe in the old adage more is better. Case in point if a 2MB sort buffer size is good, a 32MB or a 128MB sort buffer size is even better. While it is true that larger key buffer and the innodb buffer;s can provide huge performance boosts, other settings can actually seriously hamper performance. Case in point the per thread buffers. The join buffer, read buffer, read rnd buffer, sort buffer, etc. These buffers set limits on memory per thread. Setting these too high can run the risk of some really large memory footprints in individual threads.

Most of the time when I see these values are set higher then 2MB, it is not that high because of careful consideration or benchmarking, but was set simply because bigger must be better. If your following general guidelines for a system running innodb you will already have a large portion of your system memory allocated to the innodb buffer pool. The larger innodb buffer pool with larger per thread memory seriously increases the chances of swapping. Another consideration is that setting certain buffers to higher values often will cause the MySQL engine to change how it is handling memory and can lead to inefficiencies. I am not going to rewrite a lot of detail that is already out in the wild on these. But the motto “Think before you tinker” is a good rule to live by.

Here is a a good list of required reading on setting the per thread buffers to high:

http://dev.mysql.com/doc/refman/5.1/en/memory-use.html
http://www.mysqlperformanceblog.com/2007/09/17/mysql-what-read_buffer_size-value-is-optimal/
http://www.mysqlperformanceblog.com/2006/06/06/are-larger-buffers-always-better/
http://mysql-ha.com/2007/09/06/read-buffer-performance-hit/
http://www.mysqlperformanceblog.com/2007/09/12/read-buffers-mmap-malloc-and-mysql-performance/
http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html

Read the rest of this entry »

March 3rd, 2008

Quick note on innodb_file_per_table

Ran into an interesting issue with a client today who was using innodb_file_per_table. This client had a lot of tables ( 300K+), they were using innodb_file_per_table. Today they had a problem that caused their db to crash. After fixing their issue, the database restart seemed to hang. The only thing in the error log was the mysql was started, their was no crash recovery messages. An strace revealed that the mysqld process was stating all the datafiles. It seems that during a recovery the first step is to just check to see if all the individual files are really their. The entire process to simply stat each datafile took over 45 minutes to complete after which the actual crash recovery started. Just a quick note to remind myself in the future if I see this again.

February 28th, 2008

Final Mtron SSD, MySQL, MFT Benchmarks and more!

Well its been a while since I started this, so I thought it was about time to put some of this to rest. I have been beating up the Mtron disks for several weeks. During that time I had a couple of really odd outages, a lot of head scratching, and my fair share of number crunching. I have been talking with Kevin Burton a lot the last few weeks, we have compared and shared information on the optimal configuration for the current generation of ssd drives. I have held back a large number of my benchmarks because i want to get a complete picture. I want to evaluate the drives & the MFT software while minimizing any self induced mishaps. I have ran and reran tests over and over again, at last count I had run over 280 DBT2 tests using my little server. I have tried wacky and wild things, some worked some did not. In the end I think I found out enough to make some recommendations and share my thoughts.

I focused on the bare Mtron SSD drive in my previous benchmarks, this time I will not only expound a little bit more on the bare drives but I am also adding in a drive that is using Easy Computing Company’s MFT technology. MFT is a propitiatory device mapper technology that helps rewrite random IO’s to make them more efficient. You use a standard off the shelf SSD disk drive, and MFT enhances it.

Lets do a quick recap for those joining us. As noted before, bare SSD drives are very sensitive to Random Reads.

 

With 10K reads, a single 10K Raptor drive alone could only handle 161 IOPS, while the Mtron drive was 32X faster in the same test. The opposite happens when testing writes. The SSD drive is 2x slower then the Raptor drive on random writes. As the workload moves from write heavy to read heavy you start to see a shift in the number of IOPS that can be performed… see below:

Read the rest of this entry »

February 16th, 2008

SSD and MySQL Tests… The logic behind the tests & wondering did I break it?

Over the last several weeks I have been testing out the 2 mtron drives that easy computing company provided me. In fact I have been testing like a madman. Nightly I would kick off a batch of DBT2 tests, sysbench tests, bonnie++, and more. Each test changed something about the environment I was testing. RAID 1, RAID 0, with easy’s MFT, without the MFT, with the logs on SSD, With the logs on a raptor drive, etc. Each test run when it completed successfully would take roughly 12 hours. a typical example of a portion of one of these test runs is:

shutdown mysql
remove the data and log files
wait 5 minutes
startup mysql
wait 5 minutes
load a 200 warehouse DBT2 test
wait 5 minutes after the load is complete
run DBT2 test

rinse and repeat.

Minimally I have been doing a 40 active warehouse dbt2 test and a 60 active warehouse dbt2 test for each change I make. I was trying to do each test (whether it was xfs vs ext3 or raid1 tests ) at least 2 times ( tried for 3 times, but did not always follow through ). Running these multiple times helps validate the findings. So far I have gathered a ton of data.

These tests were marching along wonderfully until Thursday night. Thursday night ( about 3 weeks into my tests ) my system raised a kernel panic. Not good, found that about half my test finished. So I rebooted, fsck’d the disks and started over. Several hours later… kernel panic. Same place as before. At the time I was running tests on a naked mtron vs easy’s MFT technology. I had run similar tests for the previous two days with awesome results. I stepped through my test script., which showed shortly after MySQL starts up a kernel panic happens. Looking at the console the only thing meaningful in the panic message is something about device sdd ( hey that is the primary mtron drive I am using! ). After a reboot I check out the MySQL Error logs, its writing the innodb datafile and all the kernel panics appear to have occurred when after it has gotten to about 2GB out of the 15GB the initial innodb datafile is going to take.

I tried the other mtron drive ( one that has been used a lot less then the primary ), the build works perfectly fine (The test ran and no kernel panic occurred). So it does not appear to be controller or OS related ( or so i think ). So I then tried simply copying a file over to the mtron drive giving me the issues. During the file copy, their it is… a kernel panic! I am right now in the middle of running some diagnostics on the drive. But I have to wonder is this a manufacturing issue or did I just do too many write cycles? I will post more later.

February 11th, 2008

More XFS & MySQL Benchmarks

I did some more benchmarking on XFS over the weekend. I ran at least 3 independent tests, which were all within 5-10% of the below results.





Mtron 4 Disk (15K ) Raid 10 (256MB Cache) Single Raptor
Threads read/write test
XFS EXT3 XFS EXT3 XFS EXT3
16 1 rndrw 50% writes 118.32 196.83 71.37 471.4 152.9 172.82
16 2 rndrw 33% writes 179.67 273.21 98.37 590.4 155.58 164.58
16 3 rndrw 25% writes 224.15 330.34 119.69 899.92 158.47 159.3
16 4 rndrw 20% writes 273.55 550.34 141.65 879.53 160.3 165.98
16 5 rndrw 17% writes 363.68 663.57 184.04 939.29 161.98 176.76
16 na rndrd 0% writes 6286.18 5352.63 4673.44 3171.42 177.02 161.61
16 na rndwr 100% writes 65.65 105.71 47.91 350.16 154.22 199.8
1 1 rndrw 50% writes 53.35 207.51
2 1 rndrw 50% writes 56.93 192.19
8 1 rndrw 50% writes 93.51 191.32
32 1 rndrw 50% writes 143.29 179.7
64 1 rndrw 50% writes 164.24 209.4

The results are all random read/write tests. They are interesting in the start to show a pattern of an XFS slowdown in what should be faster disk systems.  The Raptor drive shows little difference with 16 threads between ext3 and xfs

Raptor XFS

On the same server I have the Mtron SSD drives still hooked up.  I reran the same tests several times with both EXT3 & XFS:

Mtorn XFS

In most tests I saw about a 2x decrease in XFS performance over EXT3.  I also had the oppertunity to test against a 4 disk RAID 10 setup last week as well.  The numbers are even more lopsided then the mtron numbers:

Raid 10 XFS

The differences are staggering.  A 9x performance boost was seen in the sysbench tests.  Going beyond just the generic benchmark tests I ran DBT2 tests with XFS and EXT3.

I ran DBT2 with 200 total warehouses.  The tests were done with 40 & 60 active warehouses.  On EXT3:  With 60 warehouses I got  13516.92 new-order transactions per minute (NOTPM), with 40 I got 21844.31 new-order transactions per minute (NOTPM).  Doing the exact same test on XFS:  With 60 warehouses I got 3674.32 new-order transactions per minute (NOTPM) while with 40 warehouses I got 2301.55 new-order transactions per minute (NOTPM).  The load time for the 200 warehouses in ext3 was 30 minutes 26 seconds while xfs load time was 55 minutes and 56 seconds.

February 10th, 2008

More XFS Fun on Centos 5 with MySQL

More interesting stuff here. I reran the XFS tests yesterday, same old story. The generic sysbench disk ( on mtron ssd ) tests show this:

EXT3:

Operations performed: 5001 Read, 4999 Write, 12800 Other = 22800 Total
Read 78.141Mb Written 78.109Mb Total transferred 156.25Mb (3.1325Mb/sec)
200.48 Requests/sec executed

XFS:

Operations performed: 5006 Read, 4994 Write, 12800 Other = 22800 Total
Read 78.219Mb Written 78.031Mb Total transferred 156.25Mb (1.7767Mb/sec)
113.71 Requests/sec executed

The dbt2 tests also reconfirm this. Simply putting the innodb log files on the XFS filesystem caused the loading of the database to go from 29 minutes 43 seconds to 56 minutes 23 seconds. The DBT2 test results dropped in one test from 21K New order TPM’s to 3.7K New Order TPM’s. I am going to rerun this test in Ubuntu to see if this is a Redhat/Centos package problem. I may also try and build xfs if time permits.

Let me add a new wrinkle to this… On a single SATA 10K raptor drive:

EXT3:

Operations performed: 5005 Read, 4995 Write, 12800 Other = 22800 Total
Read 78.203Mb Written 78.047Mb Total transferred 156.25Mb (2.4041Mb/sec)
153.86 Requests/sec executed

Operations performed: 5007 Read, 4993 Write, 12800 Other = 22800 Total
Read 78.234Mb Written 78.016Mb Total transferred 156.25Mb (2.496Mb/sec)
159.74 Requests/sec executedXFS Results ( 2 Runs ):

XFS:

Operations performed: 5007 Read, 4993 Write, 12800 Other = 22800 Total
Read 78.234Mb Written 78.016Mb Total transferred 156.25Mb (2.5968Mb/sec)
166.20 Requests/sec executed

Operations performed: 5006 Read, 4994 Write, 12800 Other = 22800 Total
Read 78.219Mb Written 78.031Mb Total transferred 156.25Mb (2.5993Mb/sec)
166.36 Requests/sec executed

No idea why the slower disk is faster. Given the RAID 10 differences I saw as well ( 1200 requests vs 167 ) could these be a problem with faster disk subsystems?

February 9th, 2008

XFS & Centos 5 & MySQL Performance

OK, Sometimes you stumble onto things that are just not right. On my own personal set of hardware (independent from the Server with the IBM Raid card) I am still running Tests with the Mtron flash drives. I noticed a huge regression in performance late this week from earlier in the week. In fact it was truely head scratching. DBT2 results that were in the 26K TPM range suddenly dropped to 4K TPM.I hate to admit a big screw up, but my tests towards the end of the week were tainted. While investigating the problem with the IBM Raid Card and XFS, I took one of the Mtron drives and rebuilt it with XFS. The benchmarks (sysbench) showed that on my hardware their was little performance difference. When I resumed my Mtron dbt2 benchmarks the mtron drive that housed the log files was indeed on this xfs drive.

Trying to figure out what was going on I noticed this (sde is the log drive, sdd is the data ):

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdd 16.02 79.76 728.09 79.98 29500.03 20417.09 61.77 4.67 5.78 0.45 35.98
sde 0.00 0.00 0.00 42.85 0.00 1115.12 26.03 0.99 23.11 23.07 98.86

42 writes per second is really low.

Looking at my previous tests iostat -x output which was performed on EXT3:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdd 22.94 153.62 1895.74 153.96 63260.72 39412.58 50.09 5.30 2.59 0.38 77.82
sde 0.00 307.59 0.00 332.15 0.00 5117.92 15.41 0.07 0.21 0.21 7.00

Wow the difference is night and day. The bottleneck switches from sde the log drive to sdd the data drive. I redid the tests. Ext3 screamed and xfs choked again.

For the sake of documentation:

Centos 5 ( 2.6.18-53.1.6.el5 )
kmod-xfs-0.4-1.2.6.18_53.1.6.el5
xfsprogs-2.9.4-1.el5.centos

The dbt2 test in question was using 200 installed warehouses, 100 active, 16 connections, & 15 terminals per warehouse. I am setting up another test run, I will post the results here.