Improvement in subquery execution with the upcoming MySQL-6.0 branch

Although I know the JOIN syntax very well, IN clauses are often so much easier to read. A few years ago I was developing a custom ERP application on SAPDB/MAXDB (still opensource at the time) and I used extensively IN clauses to make complex queries easier to read. Of course with MySQL I have been fairly disappointed up to now by the performance of the IN clause. Look at the following explain from a 5.1.x database:

mysql> explain select u2.name from user u2 where u2.id IN (select Friend from friend where friend.UserID IN (select id from user where name = 'Yves'));
+----+--------------------+--------+-----------------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table  | type            | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+--------+-----------------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY            | u2     | ALL             | NULL          | NULL    | NULL    | NULL |    8 | Using where              |
|  2 | DEPENDENT SUBQUERY | friend | index           | NULL          | PRIMARY | 8       | NULL |    4 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | user   | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 | Using where              |
+----+--------------------+--------+-----------------+---------------+---------+---------+------+------+--------------------------+
3 rows in set (0.01 sec)

As one can see, it considers the sub queries as DEPENDENT even if they are absolutely not dependent. This will cause the sub queries to be executed for each rows of u2, a performance bummer. Fortunately, MySQL 6.0.x has a much improved handling of the IN clauses, the same explain on a 6.0.x server gives the followings:

mysql> explain select u2.name from user u2 where u2.id IN (select Friend from friend where friend.UserID IN (select id from user where name = 'Yves'));
+----+-------------+--------+--------+---------------+---------+---------+--------------------+------+-----------------------------+
| id | select_type | table  | type   | possible_keys | key     | key_len | ref                | rows | Extra                       |
+----+-------------+--------+--------+---------------+---------+---------+--------------------+------+-----------------------------+
|  1 | PRIMARY     | friend | index  | PRIMARY       | PRIMARY | 8       | NULL               |    4 | Using index; LooseScan      |
|  1 | PRIMARY     | u2     | eq_ref | PRIMARY       | PRIMARY | 4       | test.friend.Friend |    1 |                             |
|  1 | PRIMARY     | user   | eq_ref | PRIMARY       | PRIMARY | 4       | test.friend.UserId |    1 | Using where; FirstMatch(u2) |
+----+-------------+--------+--------+---------------+---------+---------+--------------------+------+-----------------------------+
3 rows in set (0.00 sec)

Where you can observe that all the queries are now independent and I expect performance to be way better.

About Yves Trudeau

I work as a senior consultant in the MySQL professional services team at Sun. My main areas of expertise are DRBD/Heartbeat and NDB Cluster. I am also involved in the WaffleGrid project.
This entry was posted in mysql, yves. Bookmark the permalink.

3 Responses to Improvement in subquery execution with the upcoming MySQL-6.0 branch

  1. KevinF says:

    Matt, thanks for the heads-up.
    Does mysql pre-6.0 ever consider see subquery clauses as independent? I gave up trying.

  2. KevinF says:

    Oops, sorry! Yves, that is.

  3. Yves Trudeau says:

    That example is fairly simple. I can show you queries joining more than 10 tables. With the IN and NOT IN clauses, I can say in a few seconds what the query is returning but with JOIN and OUTER JOIN the reading is not as trivial. You are right that it is probably a more procedural way of seeing things, my point is just that it is easier to read.