The early end of the MiniWheatFS project for tmpfs with MySQL!!!

I am happy to say that I found a kernel level way to join a ramdisk (actually not a tmpfs strictly speaking) with a normal file system. This cut my motivation for the MiniWheatFS project by 99.999%. I recall that the goal of the MiniWheatFS project was to provide and efficient filesystem for the the “tmpdir”, where MySQL puts its temporary files and tables. The trick uses the tendancy of ext2 (and probably ext3 and 4) to use the first available block from its bitmap and LVM to join a ramdisk with a normal device. Here are my steps.

1. Give a ramdisk to LVM

root@yves-laptop:/home/yves# pvcreate /dev/ram0
  Physical volume "/dev/ram0" successfully created

By default, my Ubuntu laptop creates 16 ramdisk of 64 MB each. RAM is not allocated until used. To create bigger one, you need to add a ramdisk_size=SizeInKB to the kernel command line in Grub menu.lst or lilo.conf. You will then need to reboot.

2. Give a disk device to LVM

In my case, I already use LVM so… I gave LVM an LVM logical volume (previously created) of 1 GB.

root@yves-laptop:/home/yves# pvcreate /dev/vg0/lvMiniWheatdisk
  Physical volume "/dev/vg0/lvMiniWheatdisk" successfully created

3. Create the volume group with the ramdisk first

root@yves-laptop:/home/yves# vgcreate vgMiniWheat /dev/ram0
  Volume group "vgMiniWheat" successfully created

4. Create the Logical volume

root@yves-laptop:/home/yves# lvcreate -L60M -n lvMiniWheat vgMiniWheat
  Logical volume "lvMiniWheat" created

Now, the logical volume has the ramdisk at its beginning

5. Extend the volume group and the logical volume over the disk device

root@yves-laptop:/mnt# vgextend vgMiniWheat /dev/vg0/lvMiniWheatdisk
  Volume group "vgMiniWheat" successfully extended
root@yves-laptop:/mnt# lvextend -L+60M /dev/vgMiniWheat/lvMiniWheat
  Extending logical volume lvMiniWheat to 120.00 MB
  Logical volume lvMiniWheat successfully resized

I just added 60 more MB but I could have added the whole GB.

6. Create the file system

root@yves-laptop:/mnt# mkfs.ext2 /dev/vgMiniWheat/lvMiniWheat
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
30720 inodes, 122880 blocks
6144 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
15 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 34 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

I use the ext2 file system since the is no need for a journal. Upon restart, you need to reconfigure from scratch.

7. Tests

I use 2 threads of a select query on a table with 200k rows ordering a text column in order to force the use of disk based tables. In order to avoid pollution by the file system write cache, I mounted the normal /tmp and the new composite device with the “sync” option.

With the normal /tmp, I was disk bound while with the composite device, I was CPU bound. Here are extract of vmstat during the 2 tests, choose the one you prefer…

normal /tmp

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 840340 102044 616328    0    0     0 11342  887 4932 23  7 65  5
 1  0      0 881384 102044 575572    0    0     0 10912  885 4887 26  7 62  5
 0  1      0 864772 102044 592148    0    0     0 11254  896 3471 26  3 67  4
 2  0      0 847816 102044 608840    0    0     0 11392  880 4802 30  6 61  3
 2  0      0 887956 102044 569224    0    0     0 12094  934 5051 23  8 65  4
 3  0      0 870412 102044 585544    0    0     0 11125  881 4085 28  5 64  3
 0  0      0 855308 102044 601620    0    0     0 10982  858 4443 28  5 63  4

composite device

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0 848860 116420 581692    0    0     0 60186   63 1753 87 13  0  0
 2  0      0 847416 116664 582724    0    0     0 59153   62 3081 86 14  0  0
 2  0      0 843768 117032 586172    0    0     0 61621   63 1728 88 12  0  0
 2  0      0 842712 117268 587016    0    0     0 58919   64 3115 87 13  0  0
 3  0      0 840128 117528 588616    0    0     0 59712   63 2420 88 12  0  0
 2  0      0 839884 117844 589020    0    0     0 58647   70 2353 85 14  1  0
 2  0      0 836248 118116 592420    0    0     0 61576   63 1676 88 12  0  0

Of course, the query time with the composite device was less than a third of the normal disk. Having said that, remember that I mounted with the “sync” option. If you moderately hit disk, you might not see any difference but if you saturate the Linux write cache, the difference will be similar to mine. Another warning point, I tested the setup for about an hour, do your own testing before putting that into production.

About Yves Trudeau

I work as a senior consultant in the MySQL professional services team at Sun. My main areas of expertise are DRBD/Heartbeat and NDB Cluster. I am also involved in the WaffleGrid project.
This entry was posted in linux, mysql, performance, yves. Bookmark the permalink.

4 Responses to The early end of the MiniWheatFS project for tmpfs with MySQL!!!

  1. Henrik Ingo says:

    Yves, just to remind those of us who didn’t keep track: The point of this project is:
    1) You want a database that is fast and data is not persistent?
    2) You are simulating something that you actually want to do with SSD instead of ramdisk (and that looks ZFS/OpenStorage-like)?

    henrik

  2. Yves Trudeau says:

    No exactly, the point is to provide an efficient destination for “tmpdir” where MySQL puts its temporary files and tables. I have updated the post to state the goal.

  3. angus says:

    Having in the past tested tmpfs as tmpdir, I’ve set up ramdisk w/ lvm2 like this:
    – compile latest 2.6.29-rc8 w/ ramdisk default 512MB
    – pvcreate /dev/ram{1-9}
    – vgcreate & lvcreate & mkfs.ext2 & mount -o sync
    – mysqld restart

    I’m getting around 150MB/s on this ramdisk for temp table on disk but I can’t see any improvement in slow query logs compared to normal setup (no tmpfs, only tmp dir on ext3 or reiserfs). It seems that linux fs file cache do (for my setup) the job w/ same results that those ramdisk/tmpfs setup.

    Maybe using something different than ext2 could improve perf. Perhaps new Write barrier support with LVM2 in latest 2.6.29-pre is for something in my results; I’m refering to .

  4. angus says:

    tag markup fixed in my last comment:
    I’m refering to This Article blog entry