A Few Common Performance Mistakes

As I run into the same mistake over and over again I am going to throw them out here in order to serve as a warning and a learning tool to others. Some of these may seem like I am beating a dead horse because you can find examples of these on other sites, in other blogs, or even in the manual. Despite the large amounts of data on the subjects these things still pop up. This is not a complete list, and this list is not in any order… so please don’t dispare if I miss your favorite common performance gaff. As I encounter more I will post them to the common mistakes category here on bigdbahead.com. Hopefully this list will grow to include a large subset of issues and maybe one or two people will take note and fix their applications and databases.

 

#1.) MORE IS NOT ALWAYS BETTER

We often see configuration examples where folks believe in the old adage more is better. Case in point if a 2MB sort buffer size is good, a 32MB or a 128MB sort buffer size is even better. While it is true that larger key buffer and the innodb buffer;s can provide huge performance boosts, other settings can actually seriously hamper performance. Case in point the per thread buffers. The join buffer, read buffer, read rnd buffer, sort buffer, etc. These buffers set limits on memory per thread. Setting these too high can run the risk of some really large memory footprints in individual threads.

Most of the time when I see these values are set higher then 2MB, it is not that high because of careful consideration or benchmarking, but was set simply because bigger must be better. If your following general guidelines for a system running innodb you will already have a large portion of your system memory allocated to the innodb buffer pool. The larger innodb buffer pool with larger per thread memory seriously increases the chances of swapping. Another consideration is that setting certain buffers to higher values often will cause the MySQL engine to change how it is handling memory and can lead to inefficiencies. I am not going to rewrite a lot of detail that is already out in the wild on these. But the motto “Think before you tinker” is a good rule to live by.

Here is a a good list of required reading on setting the per thread buffers to high:

http://dev.mysql.com/doc/refman/5.1/en/memory-use.html
http://www.mysqlperformanceblog.com/2007/09/17/mysql-what-read_buffer_size-value-is-optimal/
http://www.mysqlperformanceblog.com/2006/06/06/are-larger-buffers-always-better/
http://mysql-ha.com/2007/09/06/read-buffer-performance-hit/
http://www.mysqlperformanceblog.com/2007/09/12/read-buffers-mmap-malloc-and-mysql-performance/
http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html

#2.) Text fields can be bad if used improperly

I have seen numerous clients set their max_heap_table size and tmp_table_size to something fairly large in a vain attempt to reduce the number of temporary table builds going to disk. I want to say that 99% of the time when I see these large values, the large settings are unwarranted. In these cases the temp tables are being written to disk because the operations involve text fields. Most never knew this is a restriction… but check out the manual (http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html) where it says “

Some conditions prevent the use of a MEMORY temporary table, in which case the server uses a MyISAM table instead:

  • Presence of a TEXT or BLOB column in the table
  • Presence of any column in a GROUP BY or DISTINCT clause larger than 512 bytes
  • Presence of any column larger than 512 bytes in the SELECT list, if UNION or UNION ALL is used.

What makes this issue even more frustrating is most of the tables use a simple text datatype ( which generally only has at most a few thousand bytes ). The Text datatype has a maximum size of 64K, which ironically is also the same max size of a varchar. So, by changing the text field over to a varchar we may allow the query to build an in memory temp table instead of one on disk.

#3.) (Data) Size Does Matter!

It is very common for us to see bloated and oversized databases when out on a gig. Ronald likes to say he has yet to be on a gig where he has not been able to shrink the database size by 25%. Because disk is cheap most people just do not put much thought into properly sizing tables and objects. This can lead to serious issues. The connection most people miss is that larger datatypes not only waste disk space they waste memory. Take for example the following table:

 

CREATE TABLE `MyTest` (

`id` bigint(20) NOT NULL AUTO_INCREMENT,

`name` varchar(255) DEFAULT NULL,

`type` bigint(20) DEFAULT NULL,

`createdate` datetime DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `myidx` (`name`,`type`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

 

The id field which is a bigint will take up 8 bytes. So if your table has 1 million records in it, the primary key will take up 7.6MB … by switching this to a integer ( 4 bytes ) the size decreases from 7.6MB to 3.8MB. While this seems small, as your data gets larger the difference can add up quickly. For instance if you have 10 million records in 10 different tables by using an int datatype you now saved 380MB of data. Also because in innodb the PK is stored along with all other indexes as a pointer back to the data your other indexes in your table will also suffer from extra bloat. That means not only is your disk footprint reduced, but also that the same amount of data can be stored in a smaller part of the innodb buffer pool, allowing for something else to stay in memory.

Here is a quick chart of some of the common data type conversions we do:

DataTypebytessizePreferedDataTypebytessize
datetime8 bytestimestamp4 bytesseconds since epoch (up to 2038)
bigint 8 bytes Unsigned Up to:
18,446,744,073,709,551,615
int 4 bytes Unsigned Up to:
4,294,967,295
text variable up to 65,535 varchar variable up to 65,535
float 4 bytes Decimal variable
double 8 bytes Decimal variable

Read more:

 

http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html

http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/

 

#4.) Watch out for statements that compare an integer to a character field

Every knows that if you miss enclosing a character reference in a select statement it will complain. For instance:

mysql> explain select * from MyTest where name =matt;

ERROR 1054 (42S22): Unknown column ‘matt’ in ‘where clause’

But what most miss is this is not always the case:

mysql> explain select * from MyTest where name =1234;

+—-+————-+——–+——+—————+——+———+——+——+————-+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+—-+————-+——–+——+—————+——+———+——+——+————-+

| 1 | SIMPLE | MyTest | ALL | myidx | NULL | NULL | NULL | 1 | Using where |

+—-+————-+——–+——+—————+——+———+——+——+————-+

The impact of this can be huge, I was at a client who used a varchar as a PK on a table ( bad by itself ). All the values in the varchar field where numeric, but all the statements coming from the application used unescaped integers to within the SQL statement ( i.e. id = 123 instead of id = ‘123’ ). This has the unfortunate side effect of bypassing the index, which in this extreme case meant no PK access … which was very very bad.

Read more here:

http://www.mysqlperformanceblog.com/2006/09/08/why-index-could-refuse-to-work/

 

#5.) Left join drunk

One of the big pet peeves of mine is looking through thousands of SQL statements and seeing everyone of the statements is doing a left outer join. My Absolute favorite is something like this:

select a.var, b.var from a left join b on a.id=b.id where b.var = ‘xxx’ ….

folks if you set the value of a column in the b table your left join is meaningless, think about it… if the ids do not match the b side will all return null… a null b.var will be filtered out.

Also … another little left join nugget. I don;t know why but several folks have done this type of an operation:

select a.var1, a.var2 from a left join b on a.id=b.id where a.var1 = ‘xxx’

The question is what is the benefit of joining to b?

My favorite argument of left join happy shops is, well we put it in there just in case we missed something… enough said.

This entry was posted in Common Mistakes, mysql, performance. Bookmark the permalink.

7 Responses to A Few Common Performance Mistakes

  1. Pingback: Last week links « Piku’s Blog

  2. Pingback: purrl.net |** urls that purr **|

  3. Pingback: Devolio

  4. mediahost says:

    I have to say that I actually thought that the bigger the better was true.

  5. Enoch says:

    I might be considered left join happy. And the query is killing me it is so slow. it takes over 20 seconds to run. However – when I remove the ORDER BY it runs in under a second:

    Just the facts:
    1)I wrote a php pagination function to which I pass a query that generates about 5500 results and displays it in pages of 25.
    2) Showing rows 0 – 25 (5,500 total, Query took 23.4910 sec)

    3) When I remove the ORDER BY the query takes less than one second.

    4) I am using ORDER BY a datetime field to enable me to display results by the most recently accesed record(to the second). I need to maintain this feature.

    5) Indexed, the datetime field contains the same number as the records returned.

    Comments:
    I am using InnoDB.
    As the query results are displayed in pages – each new page takes roughly the same amount of time to load.
    I am ok at php/mysql but not great (obviously).

    The query: (names have been changed to protect the guilty)
    SELECT t1.field, t1.field2, t2.field,
    t1.field3, t1.field4, t3.field,
    CONCAT(t4.field1,’ ‘, t4.field2) AS somename,
    t1.field5, t5.field, t6.field, t2.field2,
    t2.field3, t7.field, t1.field6
    FROM table1 AS t1
    LEFT JOIN table2 AS t2 ON t1.field=t2.field
    LEFT JOIN table3 AS t3 ON t1.field2=t3.field
    LEFT JOIN table4 AS t4 ON t1.field3_id=t4.field
    LEFT JOIN table5 AS t5 ON t1.field4_id=t5.field
    LEFT JOIN table6 AS t6 ON t1.field5_id=t6.field
    LEFT JOIN table7 AS t7 ON t1.field6=t7.field
    ORDER BY t2.field2 DESC

    EXPLAIN says:
    1 SIMPLE t1 ALL NULL NULL NULL NULL 6012 Using temporary; Using filesort
    1 SIMPLE t2 ALL NULL NULL NULL NULL 6012
    1 SIMPLE t3 eq_ref PRIMARY PRIMARY 1 t3.table3 1
    1 SIMPLE t4 eq_ref PRIMARY PRIMARY 4 t4.table4 1
    1 SIMPLE t5 eq_ref PRIMARY PRIMARY 1 t5.table5 1
    1 SIMPLE t6 eq_ref PRIMARY PRIMARY 1 t6.table6 1
    1 SIMPLE t7 eq_ref PRIMARY PRIMARY 1 t7.table7 1

  6. StrokeKing says:

    I totally agree with the ‘more is NOT better’ idea – I’ve found that when doing system admin, people always want to crank up their buffers to rediculous values – a-la ‘I’d like to increase my maximum execution time to 48 hours’

  7. Erock says:

    Enoch: It looks to me like you’ve got a design problem. Why would so many related columns be separated into 7 different tables? that’s a question you should ask yourself and others.

    If you can’t refactor the design then you might be better served by 5a) Divide and Conquer – It’s often more efficient and faster to break large and complex queries into multiple smaller queries. Even better would be to open multiple connections and send the smaller queries through in parallel, asynchronously. If you plan it right you can order them so that following queries take advantage of data cached by earlier ones.

    Writing a killer query is rarely the best way to solve a problem. It’s asking the database to make too many decisions which may not work out in your favor. It’s also a bitch to debug and optimize.

    Just $0.02 in a bad economy.

    Excellent site BTW, Thanks!

Comments are closed.