I wanted to publish a few interesting gotcha’s , facts, and settings people who use or want to use Tokyo Cabinet/Tyrant should know.
A quick overview, Tokyo Tyrant is the network daemon that sits ontop of Tokyo Cabinet. This means that in order to access cabinet from another server you have to access it though Tyrant. In the context of this post consider when I say Tokyo to mean the entire stack.
#1. Tokyo Cabinet allows for a single write thread. Multiple processes can try and write through tyrant but they will wait. In order to get around this limitation you need to shard your data. Using something like a memcached api ontop of a hash table is one effective way to do this.
#2. Tokyo is not durable. This means in the event of system crash you will lose data. You can call a sync process to sync data to disk, but this locks the writer process. Your best bet is to use replication to ensure you have a copy of the data and backup often.
#3. Settings for Tokyo Cabinet Files can be set via Tokyo Tyrant by adding the settings after the cabinet file: i.e.
Some of these settings only take place on file creation or on optimize so make sure you check the documentation.
#4. By Default there is a limit of 2GB per file to Cabinet files, this can be worked around by setting the #opt setting for your table type. For instance #opt=HDBTLARGE enables large files for the hash table. This setting takes place on creation or when you optimize. You will corrupt your file if you hit 2GB without this setting. If you experience this, your best bet is to restore from a backup that is < 2GB and switch the large file flag. (Note if I am correct you can only change the file to support large tables by using the cabinet mgr tools, i.e. running tchmgr -tl cabinet.tch against an offline file )
#5. Run optimize on a regular basis, I have seen files shrink by as much as 90% from running optimize.
* To run optimize on a table from tyrant you can run tcrmgr optimize -port xxx localhost ( This will lock writes )
* To run optimize a table from the cabinet command use the mgr for the correct table type ( i.e. tchmgr for the has table ).
#6. Increase the number of Tyrant threads from the default 8 if your having issues with refused connections. This is done on the command line when starting tyrant: ttserver -thnum 16
#7. Log your Tyrant errors to a log file by using the -log flag when starting Tyrant. By default just setting the log will also log info/warning messages, disable this by setting the -le flag which tells tyrant to only log errors.
#8. If your using a cabinet “table” database make sure you build the indexes you need otherwise your probably going to get rather slow performance.
#9. In terms of performance the BNUM setting typically has the largest impact on performance. According to the docs “specifies the number of elements of the bucket array”. Every table type is a bit different, so check the docs for the exact settings.
#10. For hash tables setting xmsiz can make a huge difference. This defines the memory allocated to mapping objects.