NDB Cluster is the only integrated sharding framework that I know of (educate me if I am wrong) but it is known to have issues with large joins. These days, large databases that would benefit from a sharding framework are often for social networking type applications that requires large joins.
Actually it is not NDB that is the root cause of the joins problem, it is the way MySQL executes joins. Instead of asking the cluster for a large number of rows for the secondary table it joins to, the current version of MySQL does ask one row at a time. NDB cluster answers those queries very rapidly but, the time to hop over the network kills performance. The MySQL-6.0 branch will implement the Batch Key Access (BKA) algorithm which will solve that issue and might create database application killer with NDB cluster.
Although right now BKA is not available, there are ways to execute those queries in an efficient way by rewriting them. The trick is that even if joins are poorly executed for NDB cluster, IN clause are sent in batches. In this first part, I will give a trivial example on how to rewrite queries with an IN clause. In subsequent parts, I will extend the use of the IN clause to composite primary keys.
As an example, let’s consider the following SQL statement.
SELECT u2.name FROM User u2 INNER JOIN Friend on Friend.friend = u2.id INNER JOIN User u ON u.id = Friend.userId WHERE u.name = 'Yves';
Where the Friend table implements a many to many relationship between users. The join that will hurt NDB cluster is the join returning to the user table with the list of friends. If the list is large, numerous hops over the network will be needed to execute the query. At the application level, this query can be replaced by the followings:
mysql> select id from User where Name = 'Yves'; +----+ | id | +----+ | 1 | +----+ 1 row in set (0.00 sec) mysql> SELECT GROUP_CONCAT(friendId) from Friend where UserId = 1; +----------------------+ | GROUP_CONCAT(Friend) | +----------------------+ | 2,3,4,5 | +----------------------+ 1 row in set (0.00 sec) mysql> select Name from User where id IN (2,3,4,5); +-----------+ | name | +-----------+ | Mike | | Joe | | Steve | | Matt | +-----------+ 4 rows in set (0.00 sec)
By the way MySQL interacts with NDB cluster, the last IN clause will be sent in ONE operation over the network, performance will thus be much better.