This is part 2 of My RAMSan Series.
In my normal suite of benchmarks I typically run dbt2 & sysbench oltp benchmarks next… and I did run then, but to be honest they just weren’t that interesting. They showed an improvement over my intel ssd results I ran on frankenmatt, but it was difficult to provide an apples to apples comparison. The server hardware was way different ( cpu, memory, controller, etc ). Plus I typically run a test -vs- non-flash then a test with flash, and ran tests with varying degrees of memory… the test box had 2GB of memory and sparse internal disk, so my normal test cycles were already in jeopardy. For what I ran I was pushing CPU limits long before I was hitting the IOPS I saw above. In fact in a 100W test I ended up peaking @ 1200 iops, while the CPU was @ 100%.
The challenge is building an effective solution that will easily maximize MySQL throughput, but taking into account their are are internal things in innodb that may bottleneck somewhere or another. If I can easily replicate a bottleneck at a high load consistently maybe I can trace it and try and fix it.
So I started building my new benchmark. I need to name it still. Its a perl based benchmark that collects data for various test… the first tests are disk bound tests ( specifically to tax flash ). I will go into details later in another post. Not nearly where I want it to be ( complete wise ), but the test worked for my immediate goal. Note, I did have some v1 bugs I have been working out that effect the numbers I will show here, so other vendors numbers maybe better or worse as I fix things.
So I started by generating my test data and loading the data. Only because I have no baseline data, I am presenting the same times from a single Intel SSD.
Loading one of the large benchmark tables, I tracked Inserts per second:
In this single threaded test I think the RAMSan’s DDR memory really helped. I would think the raw flash #’s from a single threaded test against two similar flash drives would be close, but here the RAMSan puts out 2.5x more Inserts per second.
During this load here is the disk latency:
Thats pretty high on the Intel side… but that does help explain the 2.5x performance difference.
Next my test builds meta data about the database’s it just created. To do this I run a single threaded process that collects column stats from the tables by running 36 queries.
To give you an idea of the normal run time I included the build time of 2 10K raid-0 disks.
The intel drive was a little slower the the RAMSan in almost all the tests… but remember this is single threaded. Where the RAMSan should shine is when we start throwing concurrent load at it. In case your wondering, here is the Meta Data build time.
So slightly better, but this is really not a great test… other then to show yeah if you had 1 process running on the DB things would be faster.
So Lets look at my read-only io bound test. This one is pretty good all things considered… but it does have some flaws (I wrote it the day before I tested it:) First I started off only gathering run time information, the problem with this is some of the tests finish earlier then others, so I may get 4 threads for ¾ of the runtime, and then 3 threads for the rest. I ran this test for 4-threads and then fixed it ( each thread calls the io test 3 times):
typically 1.5-2x longer to run the test on the Intel drive… but I noticed all kinds of flaws here in my processing and code. The interesting thing and the reason I posted this is notice how consistent the workload is on the RAMSan. But in this test the disk was not really being taxed.
So I fixed some issues, and reran this with 8 threads.
What’s interesting here is some of the Intel runs finished well before the others… So at the end ( runs 17-24ish, their were fewer active threads )… Its a design flaw in my benchmark I am working on.
That being said, the numbers are still pretty decent with an average 2x performance speed up.
The Intel drive Is kind of all over the board, while the RAMSan tends to be pretty smooth. In fact here is the std deviation from the tests…
So the RAMSan is definitely giving a more stable performance, which is too be expected. Now lets look a a portion of the io #’s ( as reported by iostat) from this test.
You can see the intel drive peaking about about 2500 r/s with the test, which is pretty good. On this system I have gotten up to 3800 r/s, so It does not look like I have hit my peak yet… I may have to try 16 threads to really push this. But I will try later… whats interesting is the latency #’s:
Not sure why we were getting 200ms+ response time here. But we definitely see some really high spikes where concurrency killed the single drive. I will look into this.
Now lets look over at the RAMSan:
Spikes up to 13K iops per second. The valleys are new tests firing up. And from a latency standpoint:
So if we compare the iops from the first 450 seconds of each test run ( intel drive took much longer, I cut it off as they were similar ). Whats funny about latnecy #’s is sometimes they will go up if there are less samples… its the law of averages. 100 ios taking over 3 ms mean a lot more over 1000 ios then they do over 10000.
Based on these number I should be able to drive a lot more IO through the RAMSan I just did not get the chance to run more tests due to time, and my benchmark is not 100% baked yet.
Next up a rel brief look at mixed load on the RAMSan.