As I run into the same mistake over and over again I am going to throw them out here in order to serve as a warning and a learning tool to others. Some of these may seem like I am beating a dead horse because you can find examples of these on other sites, in other blogs, or even in the manual. Despite the large amounts of data on the subjects these things still pop up. This is not a complete list, and this list is not in any order… so please don’t dispare if I miss your favorite common performance gaff. As I encounter more I will post them to the common mistakes category here on bigdbahead.com. Hopefully this list will grow to include a large subset of issues and maybe one or two people will take note and fix their applications and databases.
#1.) MORE IS NOT ALWAYS BETTER
We often see configuration examples where folks believe in the old adage more is better. Case in point if a 2MB sort buffer size is good, a 32MB or a 128MB sort buffer size is even better. While it is true that larger key buffer and the innodb buffer;s can provide huge performance boosts, other settings can actually seriously hamper performance. Case in point the per thread buffers. The join buffer, read buffer, read rnd buffer, sort buffer, etc. These buffers set limits on memory per thread. Setting these too high can run the risk of some really large memory footprints in individual threads.
Most of the time when I see these values are set higher then 2MB, it is not that high because of careful consideration or benchmarking, but was set simply because bigger must be better. If your following general guidelines for a system running innodb you will already have a large portion of your system memory allocated to the innodb buffer pool. The larger innodb buffer pool with larger per thread memory seriously increases the chances of swapping. Another consideration is that setting certain buffers to higher values often will cause the MySQL engine to change how it is handling memory and can lead to inefficiencies. I am not going to rewrite a lot of detail that is already out in the wild on these. But the motto “Think before you tinker” is a good rule to live by.
Here is a a good list of required reading on setting the per thread buffers to high:
#2.) Text fields can be bad if used improperly
I have seen numerous clients set their max_heap_table size and tmp_table_size to something fairly large in a vain attempt to reduce the number of temporary table builds going to disk. I want to say that 99% of the time when I see these large values, the large settings are unwarranted. In these cases the temp tables are being written to disk because the operations involve text fields. Most never knew this is a restriction… but check out the manual (http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html) where it says “
Some conditions prevent the use of a
MEMORY temporary table, in which case the server uses a
MyISAM table instead:
- Presence of a
BLOBcolumn in the table
- Presence of any column in a
DISTINCTclause larger than 512 bytes
- Presence of any column larger than 512 bytes in the
UNION ALLis used.
What makes this issue even more frustrating is most of the tables use a simple text datatype ( which generally only has at most a few thousand bytes ). The Text datatype has a maximum size of 64K, which ironically is also the same max size of a varchar. So, by changing the text field over to a varchar we may allow the query to build an in memory temp table instead of one on disk.
#3.) (Data) Size Does Matter!
It is very common for us to see bloated and oversized databases when out on a gig. Ronald likes to say he has yet to be on a gig where he has not been able to shrink the database size by 25%. Because disk is cheap most people just do not put much thought into properly sizing tables and objects. This can lead to serious issues. The connection most people miss is that larger datatypes not only waste disk space they waste memory. Take for example the following table:
CREATE TABLE `MyTest` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`type` bigint(20) DEFAULT NULL,
`createdate` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `myidx` (`name`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The id field which is a bigint will take up 8 bytes. So if your table has 1 million records in it, the primary key will take up 7.6MB … by switching this to a integer ( 4 bytes ) the size decreases from 7.6MB to 3.8MB. While this seems small, as your data gets larger the difference can add up quickly. For instance if you have 10 million records in 10 different tables by using an int datatype you now saved 380MB of data. Also because in innodb the PK is stored along with all other indexes as a pointer back to the data your other indexes in your table will also suffer from extra bloat. That means not only is your disk footprint reduced, but also that the same amount of data can be stored in a smaller part of the innodb buffer pool, allowing for something else to stay in memory.
Here is a quick chart of some of the common data type conversions we do:
|datetime||8 bytes||timestamp||4 bytes||seconds since epoch (up to 2038)|
|bigint||8 bytes|| Unsigned Up to: |
|int||4 bytes|| Unsigned Up to: |
|text||variable||up to 65,535||varchar||variable||up to 65,535|
#4.) Watch out for statements that compare an integer to a character field
Every knows that if you miss enclosing a character reference in a select statement it will complain. For instance:
mysql> explain select * from MyTest where name =matt;
ERROR 1054 (42S22): Unknown column ‘matt’ in ‘where clause’
But what most miss is this is not always the case:
mysql> explain select * from MyTest where name =1234;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | MyTest | ALL | myidx | NULL | NULL | NULL | 1 | Using where |
The impact of this can be huge, I was at a client who used a varchar as a PK on a table ( bad by itself ). All the values in the varchar field where numeric, but all the statements coming from the application used unescaped integers to within the SQL statement ( i.e. id = 123 instead of id = ‘123’ ). This has the unfortunate side effect of bypassing the index, which in this extreme case meant no PK access … which was very very bad.
Read more here:
#5.) Left join drunk
One of the big pet peeves of mine is looking through thousands of SQL statements and seeing everyone of the statements is doing a left outer join. My Absolute favorite is something like this:
select a.var, b.var from a left join b on a.id=b.id where b.var = ‘xxx’ ….
folks if you set the value of a column in the b table your left join is meaningless, think about it… if the ids do not match the b side will all return null… a null b.var will be filtered out.
Also … another little left join nugget. I don;t know why but several folks have done this type of an operation:
select a.var1, a.var2 from a left join b on a.id=b.id where a.var1 = ‘xxx’
The question is what is the benefit of joining to b?
My favorite argument of left join happy shops is, well we put it in there just in case we missed something… enough said.