As one of the co-founders of the Waffle Grid project, I beam with pride every time I get a stellar benchmark or every time I find a new use for the Waffle. But as a professional I still have to be critical of all solutions I would recommend or deploy. One of the big goals of Waffle Grid is to replace disk IO which should be slow with remote memory which should be much faster. But what happens when the disk is no longer slow? This leads me to ask myself, is Waffle Grid only good for servers with slower disk? Or is this a solution that can also help systems with fast disk? So which should you deploy SSD -vs- Waffle? Are they competitors? Or are they complementary technologies?
I am going to say this, in these tests latency is king. The faster the drives can deliver data, the higher the benchmarks should be. Basically if my interconnect can deliver faster then the drive can serve up data, I should still see Waffle Grid perform better then SSD. A note, all previous tests were done against 2 stripped 10K RPM disks. So from a latency perspective how does the Intel do?
So the SSD drive starts really fast with 1 thread, about a quarter of a ms, before raising to just over 2ms per request at with 10 threads.
Over 1Gb Ethernet I get about 0.15-0.20 ms latency with a single thread. I have been looking for a good tool to scale tcp threads up, but have not found one. I would like to produce a similar graph the one above. I have looked at netperf and ttcp. I got the above .15-.20 ms number using ttcp. I assume that the latency from multiple threads with such a low amount of data (16K) being transfered should be small, but I prefer more concrete numbers. When I spawn off 10 ping tests ( not tcp so going to be different ) I get consistent #’s up to 10 “threads” at once. So I can only assume that 10 threads requesting 16K blocks over the network should not experience the same latency increase as the disk does ( but I want to know for sure ). Maybe one of our network guru’s out their can point out another way to verify this… I am listening.
As I mentioned before I got better DBT2 numbers from a single Intel SSD drive then I got with an 8 Disk 10K RPM ) Raid 10 system. Now I am not sure how thats going to translate if you had 8 SSD’s. This type of information is critical because lets face it no one is going to put a single disk in their system… and if they do they deserve everything they get:) I would love to get another 7 drives or get access to a fully decked out server with 8 Intel SSD’s ***hint, hint for all those looking to buy me a gift***… I not only want to see just how fast a large array of these can be, but I want to see if a remote buffer pool via Waffle Grid is still a viable cost effective solution.
But for now lets test with what we have, I am not greedy! A single SSD should at least give us some idea if faster disk will start to make Waffle obsolete.
So for Intel’s SSD we have latency somewhere between 0.24ms and 2 ms ( depending on load ) -vs- 0.2ms for the network … Just based on this I would expect to see some benefit from a Waffle Grid deployment. But there is going to be additional overhead thats going to happen on the Waffle side ( code, memcached overhead, etc ) so these may wash.
Previously we had gotten 3218 TPM on our standard (768 Buffer pool, 20 warehouses, 16 threads) DBT2 test with 2 10K dirves mirrored. By merely switching over to SSD that number jumped to 8158 TPM! Thats a huge jump for 1 piece of hardware, but it shows you how disk constrained we actually were. That number alone is just shy of our Waffle Grid test earlier which delivered 9121 TPM.
What is interesting about these numbers is that the Waffle enabled database running on 10K disks is only about 11% faster then just ssd. But what happens when you run waffle and SSD?
Wait a second, that’s a very disappointing bump in performance…. I mean we are about 20% better then just SSD, but would you really want to deploy a second server just for 20%? I wouldn’t. Now as the workload gets busier I think this number will grow. You are alleviating disk contention with remote memory calls, so Waffle is going to help their, but how much will require further testing. Before you get too discouraged, there is one more test. The above 9825 TPM was achieved while running over 1Gb ethernet, so what about a faster interconnect? I know our previous tests showed little in the way of difference between 1Gbe and using the localhost ( attempt at simulating fast interconnects ), but we need to try for the sake of completeness. Lets Look:
Hold the presses! Running memcached on the localhost is 62.5% faster, a full 42.5% higher then running over 1Gb ethernet. Why? My theory is the disk is the 10K disks are too slow, dragging down the benefit of running Waffle Grid. The SSD drive shifts the bottleneck, we are no longer constrained by slow disk, rather the network is the bottleneck. This is something we see time and time again in performance tuning: remove one bottleneck and another one appears. It really is like peeling back the layers of an onion. I do not think a potential 50-60% increase in performance can be easily dismissed.
Waffle Grid was conceived to help boost system performance by eliminating disk IO. As I have pointed out before plater based disk systems can realize huge performance benefits from both SSD and from Waffle Grid. Waffle Grid on its own can be faster in some cases then SSD, but this will more then likely shift as you add more SSD drives or cache on your controller into your disk subsystem. In the case of those using legacy servers or for those who do not want to plunge into SSD, Waffle Grid may offer an interesting alternative. It can be deployed using existing servers, can scale horizontally ( Add more memcached nodes ), and produces very good performance over current systems. If you are deploying SSD’s and looking to keep costs low ( 1Gbe ), I doubt a 20% improvement in performance would be enough to deploy both Waffle and SSD’s. However if you really are looking for the peak performance out of a system, deploying Waffle with a fast interconnect and using SSD’s may yield top notch performance that is unmatched elsewhere.
Of course these are generic benchmarks, running a controlled set of tests to simulate where we think we can have the most benefit. Other systems may see more or less of a gain. It all depends on the system. As with any solution, tip, or trick benchmark your load with it before moving it into production.