Scalesmall W2 First Redesign
This week I made progress by getting rid of bad ideas. I started implementing the messaging part. The message definitions and their operations. While doing that I had the feeling that something is wrong with the basics, just didn’t know what. On a separate thread I was watching the conference presentations of Strangeloop 2015 and found “Distributed, Eventually Consistent Computations” by Christopher Meiklejohn and “Building Scalable Stateful Services” by Caitie McCaffrey.
Very influential talks that reinforced my feeling that I need to go back designing the shared state of scalesmall
and the messaging. Let’s see why.
(It is worth having a look at the other videos of StrangeLoop 2015. They are great.)
CRDTs
The StrangeLoop talks made me interested in CRDTs. I watched few more talks like:
- (Highly suggested) Marc Shapiro the inventor of CRDTs gives a very clear and enlightening presentation
- Jeremy Ong: CRDTs in Production
- Alexander Songe: CRDT: Datatype for the Apocalypse
Here is a sentence from the Soundcloud’s developer blog ( about Roshi ):
The tl;dr on CRDTs is that by constraining your operations to only those which are associative, commutative, and idempotent, you sidestep a lot of the complexity in distributed programming.
Equipped with all these information I realized that:
- what I wanted to do with my protocol is already an existing idea
- my messages as I designed them wont’t achieve this goal
The aim
This is what I want to achieve with the messages:
- I want nodes to agree on a shared state
- the nodes don’t do agreement conversations when their information is complementary
- when the nodes need to agree, that should be done automatically by rules
- the state consists of key ranges that the nodes serve
- the shared state can be lazily distributed and to be consistent eventually
- the nodes to be able to tighten their range(s) when they are getting full
Originally I thought that sending these messages would make this happen:
- agreeing what the subranges are by
Split
-ing the subranges - start and stop serving a subrange (
Register
andRelease
) - agreeing the priority list for a subrange (
Promote
andDemote
)
Now I think differently because the subrange Split
intermixes very badly with the other messages. It makes a big difference to Split
a subrange and then nodes Register
for the subrange or first Register
and then Split
. The same applies to Split
and Promote
/Demote
. On top of all these Promote
and Demote
are not Idempotent.
The new protocol
I no longer want to split the range explicitly and don’t want to maintain the split points as an explicit information. This can be calculated if the nodes send these messages:
Register( node_id, start_range, end_range )
Release( node_id, start_range, end_range )
The good thing with this approach is that this messaging can be easily represented by a CRDT set when the elements in the set is the (node_id, start_range, end_range)
triple. Register
and Release
are the add and remove operations to the set. There are a lot of CRDT set types, differing on the conflict resolution, operation vs state based, delta or full message based. At the moment I favor AWORSet more than others, but I need to experiment with this.
The other part of the protocol is to be able to assign a priority to a node for a given subrange. Originally the Promote
and Demote
messages served this purpose, but now I believe that a PN-Counter
per subrange would be better. In fact a map between subranges to integer values would be great. Fortunately the AWORMap does exactly that.
With this CRDT map my API can be reduced to these operations:
Increase( node_id, start_range, end_range, value )
Decrease( node_id, start_range, end_range, value )
The Increase
operation would be an implicit add
if the (node_id, start_range, end_range) triple is not yet in the map. When the priority value
goes down to zero it would be an implicit Release
.
Use loom?
It turns out that much of my needs in terms of shared state is handled by the loom library. I only need to add a distribution protocol to that. I will definitely give it a try.
The other option would be to write a similar thing on my own. I believe there are a few interesting properties about my case:
- information about a subrange is not equally important to nodes
- tho ones who actually participate are more interested in changes than others
- I need to check how much history a node keeps about the map
- the nodes who don’t participate in a subrange may not want to keep too much history about it
Episodes
- Ideas to experiment with
- More ideas and a first protocol that is not in use anymore
- Got rid of the original protocol and looking into CRDTs
- My first ramblings about function guards
- The group membership messages
- Design of a mixed broadcast
- My ARM based testbed
- Experience with defstruct, defrecord and ETS
- GroupManager code works, beta
- GroupManager more information and improvements