Social Networking type queries with NDB (part 1)

NDB Cluster is the only integrated sharding framework that I know of (educate me if I am wrong) but it is known to have issues with large joins. These days, large databases that would benefit from a sharding framework are often for social networking type applications that requires large joins.

Actually it is not NDB that is the root cause of the joins problem, it is the way MySQL executes joins. Instead of asking the cluster for a large number of rows for the secondary table it joins to, the current version of MySQL does ask one row at a time. NDB cluster answers those queries very rapidly but, the time to hop over the network kills performance. The MySQL-6.0 branch will implement the Batch Key Access (BKA) algorithm which will solve that issue and might create database application killer with NDB cluster.

Although right now BKA is not available, there are ways to execute those queries in an efficient way by rewriting them. The trick is that even if joins are poorly executed for NDB cluster, IN clause are sent in batches. In this first part, I will give a trivial example on how to rewrite queries with an IN clause. In subsequent parts, I will extend the use of the IN clause to composite primary keys.

As an example, let’s consider the following SQL statement.

SELECT u2.name 
	FROM User u2 
		INNER JOIN Friend on Friend.friend = u2.id 
		INNER JOIN User u ON u.id = Friend.userId 
	WHERE u.name = 'Yves';

Where the Friend table implements a many to many relationship between users. The join that will hurt NDB cluster is the join returning to the user table with the list of friends. If the list is large, numerous hops over the network will be needed to execute the query. At the application level, this query can be replaced by the followings:

mysql> select id from User where Name = 'Yves';
+----+
| id |
+----+
|  1 |
+----+
1 row in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(friendId) from Friend where UserId = 1;
+----------------------+
| GROUP_CONCAT(Friend) |
+----------------------+
| 2,3,4,5              |
+----------------------+
1 row in set (0.00 sec)

mysql> select Name from User where id IN (2,3,4,5);
+-----------+
| name      |
+-----------+
| Mike      |
| Joe       |
| Steve     |
| Matt      |
+-----------+
4 rows in set (0.00 sec)

By the way MySQL interacts with NDB cluster, the last IN clause will be sent in ONE operation over the network, performance will thus be much better.

About Yves Trudeau

I work as a senior consultant in the MySQL professional services team at Sun. My main areas of expertise are DRBD/Heartbeat and NDB Cluster. I am also involved in the WaffleGrid project.
This entry was posted in HA, mysql, NDB Cluster, performance, yves. Bookmark the permalink.

6 Responses to Social Networking type queries with NDB (part 1)

  1. Hi!

    nice tip!

    I’d just like to add, if your IN clauses grow big (hundreds or more), please be careful and experiment whether to turn engine_condition_pushdown on or off. I found that although enabling engine_condition_pushdown can gain some performance, it can also hurt performance in case of large or complex conditions. This is because the NDB filtering magic is not as optimized as MySQL’s (or maybe this was improved lately?)

    kind regards,

    Roland Bouman

  2. Yves Trudeau says:

    You are right Roland, enabling or disabling engine_condition_pushdown can make a difference. As usual, experimentation is your friend.

  3. Henrik Ingo says:

    Yves: This is a tip I used myself recently in a similar situation in the telco world. Moving from joins to IN approach made a big difference, more than 5x better throughput. (This is even when you loose some performance due to not using prepared queries.)

    There are actually 2 things suboptimal about joins and ndb:
    – fetching from each table is a separate ndb operation is a separate network roundtrip
    – when first table in the join returns more than one value, the storage engine api in mysqld forces you to do a separate fetch from the second table for each key value.

    Thus, if you select from n tables and the resultset is m rows, worst case you have n x m network roundtrips!

    Batched Key Access will make the “m” go away, and since you only have 1 join that is good for this use case.

    However, if you have tens of joins (yes, such queries exist :-), ndb still needs to do tens of roundtrips, and this is an ndb limitation.

    Our ndb engineers are currently building a new ndb feature (Something like parallel cursor…) where you can send a batch of dependent ndb operations in one execution: “Fetch from table a first, then also fetch from table b rows using the result from a as parameter, then fetch from table c… then return everything to me in one batch.”

    BKA will be in the next-next version of MySQL Cluster, but this multicursor thing is further away.

  4. Yves Trudeau says:

    BKA with MultiCursor NDB, that would be awesome. NDB might become a database killer app…

  5. Pingback: Big DBA Head! - Database Brain Power! » Social networking type queries with NDB (part 2)

  6. Pingback: Big DBA Head! - Database Brain Power! » Social networking type queries with NDB (part 3)