Quick Cassandra Notes Part 1
2010-03-31 at 07:55 pm Matthew YonkovitTrying to use the Ruby bindings to do benchmarking, so far things are going rather slow compared to other benchmarks in Python. This could be the size of the data I am testing with as well. Still looking into things however so far loading 1 Million rows into cassandra takes ~4.6GB, while loading the exact same data into mysql takes ~950M. A 4.5x increase in storage is a lot, not sure if that will maintain as I get more data into the system, or if there is just a lot more overhead at the start. Will load 2M and 3M rows to see.
Also you have to “warm” cassandra like other databases… after loading my 1M rows, I ran some quick tests. 125 ops/s, 288 ops/s, 311 ops/s, 1530 ops/s, 1872 ops/s, 1868 ops/s…
It looks like I am really bottlenecked by the thrift calls in ruby ( per profile )… strange I am seeing the CPU tap out at 1 core when testing with this data set, testing with a smaller dataset or with python I use multiple cores… must just be a red haring.
Getting occasional ruby socket timeouts from thrift… need to look into that.