Sun/Intel X-25e 4 Disk Raid 10 tests – part 1

Everyone loves SSD.  It’s a hot topic all around the MySQL community with vendors lining up all kinds of new solutions to attack the “disk io”  problem that has plagued us all for years and years.  At this year’s user conference I talked about SSD’s and MySQL.   Those who follow my blog know I love IO and I love to benchmark anything that can help overcome IO issues.  One of the most exciting things out their at this point are the Intel x-25e drives.  These bad boys are not only fast but relatively inexpensive.  How fast are they?  Let’s just do a quick bit of review here and peak at the single drive #’s from sysbench.    Here you can see that a single X25-e outperforms all my other single drive test.

Yep you have probably seen this type of chart on other sites…   The great thing about the Intel drives is their performance on writes, this difference gives them a significant leg up on other early generation SSD’s.  But enough single drive talk.  Everyone really wants to know how the drives perform in a RAID setup, So let’s jump right into my RAID tests.  My goal is to try and figure out the optimal settings for running the Intel drive in a RAID setup (basically start building a best practices for deploying SSD’s and RAID).  Then after we have figured out the “optimal configuration”  we can then use it to benchmark against standard disk.

First if you have read my blog or followed the MySQL Performance blog you know that by default the Intel drives are not exactly safe for use in the enterprise. They contain 64Mb of volatile write cache on the drive that during an unexpected  disruption of power will cause data to be lost. Unfortunately what that means is to ensure your data is protected, you have to trade off some performance disabling this extra cache.   That performance tradeoff can be huge, In fact in many of my test the performance is cut in half:

Still fast, just not the blazing speeds I had hoped for when I first started testing.  On top of performance concerns, I wonder what sort of reliability the Intel drive will  have if the drive cache is disabled long term? There is already a lot of concern around flash’s reliability long term as it is.  You see, flash has a finite number of write cycles ( number of times a block can be erased ).   Intel claims 100,000 write cycles for the x-25e SLC drive.  While this sounds small, but the wear leveling algorithms on the drive help significantly, so much so some reports claim even under the most heavy load you will not see issues for years and years.  For more background information  you can read lots more on this here:  http://techreport.com/articles.x/15931 and here:  http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012-5.html.   But back to my point:  I suspect the drive cache is heavily utilized in the drive wear leveling operations.  If it is, that means this is helping to stave off the “write” cycle issue.   Disabling the drive cache therefore could not only be hurting your performance, but also maybe hurting the drives reliability.  What does that mean to you?  probably nothing right now, it’s more something to keep in the back of your mind.     Uunfortunately with new technology like this its hard to give concrete examples of how long you can go before you start seeing issues.   I suspect more data will be available in the coming years ( which is not a comforting thought if you are an early adopter ).     But i suspect your reliability will be driven more  from your workload then the cache.  Has anyone else seen numbers or datat on wear leveling issues with the drive cache disabled?

Another concern with disabling the write cache is internal fragmentation of data. Once again I have no data or information to support this, it’s just a theory… but the MLC Intel drives actually have been plagued with insane amounts of drive fragmentation which causes significant performance slowdowns.  Intel has released  firmware for this on the MLC side and I have not seen reports on the SLC.  The question is will the lack of “write cache” increase the likelihood of this happening to the SLC drives over time? This will require more tests to understand.

Bottom line, disabling the write cache:  is good for data, is bad performance, and maybe bad for reliability. What can you do? One possible scenario is looking at adding a UPS to your SSD enabled servers.  Not the perfect solution, but using the UPS to shutdown the server cleanly when a power outage is detected can help mitigate this issue ( it will not eliminate it ). While many data centers ( and certainly the high end ones ) have lots of redundant power, there is still the chance that your sysadmin trips over the power cable, or accidentally yanks something out of the wrong server.   There has to be a better way, any ideas?  I heard that the SUN open storage boxes have SSD’s with capacitors on them that help solve this which is great.    In the commodity space hopefully vendors will step up here.

Part 2->

  • Controller Cache Tests
  • Raid 5 -vs- Raid 10
  • Software -vs- Hardware
  • IO Schedulers
  • DBT2 Tests
  • Comparison -vs- 10K Raid setup
This entry was posted in benchmark, hardware, linux, Matt, mysql, performance. Bookmark the permalink.

10 Responses to Sun/Intel X-25e 4 Disk Raid 10 tests – part 1

  1. Kevin Burton says:

    If you’re correct that the SSD with write caching is extending life of the drive then using a RAID might not fix your problem.

    The RAID will have its own cache and battery but will disable the cache on the SSD. The cache from the RAID will then flush to the SSD periodically….

    This COULD still yield decent lifetime on the X-25…. I don’t konw.

    BTW ….. another idea. Some of the SSDs have SMART data for each erase block and the number of times it has been erased.

    IF you could run your tests both times, you can see if it’s doing more erases (assuming you trust the SMART data).

    I tried this with a previous SSD vendor (which sucked) but I couldn’t get it to work.

  2. Mark Callaghan says:

    Why do the sysbench fileio results show better random write throughput on x25-m rather than x25-e?

    Intel data sheets list the same max concurrent random IO peaks for x25-m and x25-e (~35k read, ~3k write), but better streaming write peaks for x25-e (170mb/s vs 70mb/s). Latency for a single read/write on x25-e is 75/85 mics versus 85/115 mics on x25-m. So I suspect that their specs should provide better estimates for max concurrent random IO peaks.

  3. matt says:

    Maybe this may have to do with the size of the drive and how full it was during the tests. My theory, because I am hammering the drive without a break between tests its possible the behind the scenes wear leveling maybe working harder on the smaller drive. Its just a theory right now…

    It maybe interesting to see:) I will give that a whirl and see what shakes out.

  4. matt says:

    I have never been able to hit the spec’d peaks with sysbench, here iostat -m from a 32 Thread test:

    Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
    sda 0.00 0.00 0.00 0 0
    sdb 0.00 0.00 0.00 0 0
    sdc 0.00 0.00 0.00 0 0
    sdd 0.00 0.00 0.00 0 0
    sde 0.00 0.00 0.00 0 0
    sdf 0.00 0.00 0.00 0 0
    sdg 2100.00 0.00 32.50 0 32
    dm-0 0.00 0.00 0.00 0 0
    dm-1 0.00 0.00 0.00 0 0
    md0 0.00 0.00 0.00 0 0

    avg-cpu: %user %nice %system %iowait %steal %idle
    0.06 0.00 0.50 5.74 0.00 93.70

    Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
    sda 0.00 0.00 0.00 0 0
    sdb 0.00 0.00 0.00 0 0
    sdc 0.00 0.00 0.00 0 0
    sdd 0.00 0.00 0.00 0 0
    sde 0.00 0.00 0.00 0 0
    sdf 0.00 0.00 0.00 0 0
    sdg 2015.00 0.00 30.92 0 30
    dm-0 0.00 0.00 0.00 0 0
    dm-1 0.00 0.00 0.00 0 0
    md0 0.00 0.00 0.00 0 0

    avg-cpu: %user %nice %system %iowait %steal %idle
    0.06 0.00 0.44 5.75 0.00 93.75

    Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
    sda 1.00 0.00 0.04 0 0
    sdb 0.00 0.00 0.00 0 0
    sdc 0.00 0.00 0.00 0 0
    sdd 0.00 0.00 0.00 0 0
    sde 0.00 0.00 0.00 0 0
    sdf 0.00 0.00 0.00 0 0
    sdg 2117.00 0.00 32.63 0 32
    dm-0 10.00 0.00 0.04 0 0
    dm-1 0.00 0.00 0.00 0 0
    md0 0.00 0.00 0.00 0 0

    By the way I am rerunning the tests… you see the TPS here is more in line with the M drive.

  5. Jannes says:

    @Kevin: “might not fix your problem”

    I think you mean “might fix your problem” ?

    And taking that a step further, will the RAID battery also not give the SSDs enough time to flush their caches as well?

    Meaning you could safely run these in battery backed RAID with cache turned ON afterall?

  6. matt says:

    On other thing on the low Random write, since I just touched the single disk tests and was gearing more towards the “raid tests” I am not 100% sure I did not leave the CTL cache turned on by accident.

  7. matt says:

    I will blog about this… but take a look at the IOstat-m when you get close to 100% full disk:

    avg-cpu: %user %nice %system %iowait %steal %idle
    0.25 0.00 0.37 5.87 0.00 93.50

    Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
    sda 0.00 0.00 0.00 0 0
    sdb 0.00 0.00 0.00 0 0
    sdc 0.00 0.00 0.00 0 0
    sdd 0.00 0.00 0.00 0 0
    sde 0.00 0.00 0.00 0 0
    sdf 0.00 0.00 0.00 0 0
    sdg 1734.00 0.00 26.76 0 26
    dm-0 0.00 0.00 0.00 0 0
    dm-1 0.00 0.00 0.00 0 0
    md0 0.00 0.00 0.00 0 0

    avg-cpu: %user %nice %system %iowait %steal %idle
    0.06 0.00 0.44 5.99 0.00 93.51

    Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
    sda 0.00 0.00 0.00 0 0
    sdb 0.00 0.00 0.00 0 0
    sdc 0.00 0.00 0.00 0 0
    sdd 0.00 0.00 0.00 0 0
    sde 0.00 0.00 0.00 0 0
    sdf 0.00 0.00 0.00 0 0
    sdg 1723.76 0.00 26.48 0 26
    dm-0 0.00 0.00 0.00 0 0
    dm-1 0.00 0.00 0.00 0 0
    md0 0.00 0.00 0.00 0 0

    avg-cpu: %user %nice %system %iowait %steal %idle
    0.06 0.00 0.37 5.87 0.00 93.70

    Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
    sda 4.00 0.00 0.02 0 0
    sdb 0.00 0.00 0.00 0 0
    sdc 0.00 0.00 0.00 0 0
    sdd 0.00 0.00 0.00 0 0
    sde 0.00 0.00 0.00 0 0
    sdf 0.00 0.00 0.00 0 0
    sdg 2032.00 0.00 31.36 0 31
    dm-0 6.00 0.00 0.02 0 0
    dm-1 0.00 0.00 0.00 0 0
    md0 0.00 0.00 0.00 0 0

  8. Pingback: Intel X-25e & Mysql Part 1b - Don’t let your Drive Overeat! » Big DBA Head!

  9. Pingback: Log Buffer #146: a Carnival of the Vanities for DBAs | Pythian Group Blog

  10. @Jannes: “Meaning you could safely run these in battery backed RAID with cache turned ON afterall?”

    No, because when the RAID controller flushes blocks to disks it expects they actually have been flushed. If the cache is enabled on the drive and the RAID controller flushes blocks to disk, they might end up in the physical drive cache and be lost during power failure.

    The only safe way to run a disk is with write cacheing off, unless there is a very reliable battery backup. This is why raid controllers have directly connected batteries.