Quick Cassandra Notes Part 1

Trying to use the Ruby bindings to do benchmarking, so far things are going rather slow compared to other benchmarks in Python. This could be the size of the data I am testing with as well. Still looking into things however so far loading 1 Million rows into cassandra takes ~4.6GB, while loading the exact same data into mysql takes ~950M. A 4.5x increase in storage is a lot, not sure if that will maintain as I get more data into the system, or if there is just a lot more overhead at the start. Will load 2M and 3M rows to see.

Also you have to “warm” cassandra like other databases… after loading my 1M rows, I ran some quick tests. 125 ops/s, 288 ops/s, 311 ops/s, 1530 ops/s, 1872 ops/s, 1868 ops/s…

It looks like I am really bottlenecked by the thrift calls in ruby ( per profile )… strange I am seeing the CPU tap out at 1 core when testing with this data set, testing with a smaller dataset or with python I use multiple cores… must just be a red haring.

Getting occasional ruby socket timeouts from thrift… need to look into that.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.