Cassandra - Split brain schemas
Saturday, September 24, 2011 at 10:37PM
Matthew Cooke

It's possible to end up with a Cassandra cluster where there are two different schemas running on the nodes in the ring, I don't know how, but I have managed it so it's definitely possible :) - I have knack for breaking things in strange ways.

In case anyone comes a cropper on the same issues, I've discovered some useful things whilst trying to resolve this issue. Firstly these two facts:

* Cassandra (not you) decides which version of the schema can over-write other nodes schemas, I think this is based on a timestamp (or similar). Even if Cassandra is not automatically resolving the schema issue, there is still only one schema that can spread, and AFAIK you can't select which one it is.

* Just because there are more nodes with a particular schema version, does not neccessarily mean that this schema version is the most up-to-date one. 

With that knowledge, how do you resolve the dual schema problem?

The first thing I would suggest doing is taking a node that has the same schema version as another node, wiping the data and seeing which schema version Cassandra loads back onto it. Okay, this isn't really my idea, some guy on the Cassandra IRC channel on freenode suggested I do this, but it was very helpful!

After the node is reset and the schema is loaded take a note of the schema version that got loaded onto the node. It may be the schema that was already on it, or it may be the other schema, either way you now know which schema is the up-to-date one.

Article originally appeared on Gridfire - Matt Cooke (http://www.gridfire.com/).
See website for complete article licensing information.