Testing Performance on a Texas Memory System RAMSAN-500

Well its about time I posted this:)  This is part 1 of 3 in my Ramsan series.

For those who have paid attention to my blog, know I love talking  IO!  I also love performance.  Absolutely love it.  Love disk, disk capactiy, io performance, solid state..  So as I march towards my UC session on MySQL Performance on Solid State Disk my goal is to try and test as many high end solid state disk systems as possible.  All the vendors have been great, giving me access to some really expensive and impressive toys.  I finished up testing Texas Memory System’s flash appliance the RamSAN 500 this week and wanted to post some numbers and some thoughts. TMS makes RamSAN appliances that merge disk and RAM into a really fast SANS.     First I go a ways back with TMS, I deployed and Oracle Rac installation on one of their RamSAN 300’s several years back and was impressed by the sheer raw power of the device.  The two main flavors of devices they ship are ddr based and flash based SAN’s.  The ddr based SAN actually backs up its data from memory to disk at different interval to ensure data retention which is awesome.  The flash device meanwhile uses DDR as a cache on top of flash.  The idea is your hot data is served by lots of ddr ( 16-64 GB ), while everything else comes off the flash devices.  TMS was very generous to provide me with access to a system connected to a RamSAN-500 ( flash based )  for a couple of weeks.  The system had 32Gb of cache and a TB of flash.

Testing a device like this is challenging, because the standard benchmark tools I generally use just don’t push the disk enough.  In my dbt2 & sysbench tests I ended up hitting CPU limits long before I hammered the disk.  So for these tests, I started building a new benchmark.  This benchmark was not finished in time ( it still is not finished ) to fully flex the RamSAN but it did show some interesting numbers.  So lets jump right in.

I used the fileio tests in sysbench to test various datasizes.  The fileIO test is a staple of my benchmarking, but at higher concurrencies I started to see cpu spikes that may have limited the amount I drove the system.  I still wanted to run this test because a lot of flash out their starts to bottleneck as you get closer to filling the drive.

The smaller datasets saw a nice boost in performance from the internal DDR cache on the RamSAN. You will notice as the size of the test increased we started hitting the flash as opposed to the cache more and more.  Still 16-17K (16192 byte ) iops is excellent, nearly 8x higher then my single Intel devices were capable of.  All of these tests were based on 16 threads.  But what if we have more threads?  Can push this a bit higher?. Lets try more threads!

The test that used 32 threads actually saw a nice jump in iops, showing I had not maxed out the device yet.  Unfortunately 32 threads started to hit CPU bottlenecks… but I think I could have driven the system even more, possibly hitting 40+ threads before the cpu cried uncle.

These numbers are awesome, as a point of reference recently I  tested a midrange 16 Disk raid10 San at a client site and came away with only 2500 iops on a 16 thread 20gb test, so 30000 iops is impressive.

Stay tuned for part two…

This entry was posted in hardware, linux, mysql, performance, raid. Bookmark the permalink.

7 Responses to Testing Performance on a Texas Memory System RAMSAN-500

  1. Aaron Blew says:

    What kind of system was used? How much RAM? How many HBAs? Were you using LVM?

    A little more context would help us understand what the numbers mean.

  2. matt says:

    Aaron,

    These tests were on a 4 Core AMD Opteron system with 2GB of memory. The RAMSan was a 500 model with 32GB of cache and 1 Active HBA ( so no dual path ) . These tests did not use LVM. Tests were run with the EXT3 filessytem.

  3. Matt,

    Is there any point comparing RAMSAN to Standard drives ? Considering cost per GB should not it be rather compared to SSD based solution.

    The numbers look impressive if you look at disks though it just would take few intel SSDs to match them

  4. matt says:

    Peter,

    Yes and No. The problem is we are blazing a trail with new SSD based solutions, their just is not a lot of concrete numbers from similar devices yet. So saying this is 10x faster then a 16 disk normal system does have some merit. It sets a baseline of what sort of performance improvement you can get over what most people would consider standard. In pt2 & pt3 I do compare this to a single intel drive ( because that’s all I have in house to test with )… I would love to buy another 3 Intel drives to test raid 5 performance … but even with raw SSD’s I think a device like this is still going to prove faster, just is the performance worth the cost per GB?

  5. Mark Callaghan says:

    Matt,

    Will the next post have MySQL numbers? My questions are about MySQL benchmarks that I hope you run soon.

    Can you get anywhere near saturating the read capacity of this without first pinning all CPUs in the Innodb page checksum code?

    Also, there is a huge gap between the write performance that you get here and that which you can get from InnoDB running on top of it. InnoDB is reluctant to write quickly, even with some of the Google and Percona patches.

  6. matt says:

    Hehe, everyone is soo impatient. I am building suspense.

    What numbers do guys typically see inno petering out at? My problem was I ran out of time testing the RAMSan before I could really push it to its limits. DBT2 is a horrid test for this environment because to get larger data sizes means you have to increase the warehouses, which increases the threads used, which increases the non-mysql cpu… I got up to 15 mixed threads and 8 read-only threads on the ramsan, which I felt did not tax the system. But in my Read-only IO bound test I ended up peaking at around 12K read ops per second, and cpu was at @43%usr/18%sys…

    I am right now testing violin memory’s appliances. and in a 16 thread test I am peaking at a lot less TPS so far, but the IO seems more even. Here I am hitting 60% usr/ 25% sys cpu, with 13K tps on the violin box. So I maybe coming close to saturation now.

    Plus after testing with the Supersockets interface I become convinced that dbt2’s has certain limitations holding it back.

  7. Mark Callaghan says:

    I have not done much for read-heavy workloads. It is great that you get over 10k for ready heavy. For write-heavy it will be hard to get more than a few hundred writes per second unless you fix some things in InnoDB.

    I don’t use dbt-2. I would use tpcc-mysql for something similar. But for now, sysbench is sufficient for me and I also used mibench, a modified version of the insert benchmark to get a write-heavy workload.