Scalesmall W4 Message Contents Finalized
This week I further clarified what I want from the group membership messages and also implemented them. I added unit tests for all functions, so I feel more or less safe to continue with the network communication part next week.
I am seriously considering to add property based tests as well. Only that I don’t have the experience in this, so this will need to wait a bit. (With so much time spent on the theory I am getting eager to see my stuff to do something useful.)
I decided that in
scalesmall the only one who makes a decision about the group membership and its participation in the shared work is the node itself. Thus only the given node
writes the shared information about its participation and everyone else are
This means that I only need to make sure that:
- whenever the node wants to notify the group about its decision, it eventually reaches all interested parties
- when the messages eventually arrive, the recipients must be able to correctly combine them which boils down to:
- these messages be idempotent:
A + A = A, so they may be delivered multiple times
- the messages be commutative:
A + B = B + Aand associative
A + (B + C) = (A + B) + C, so they can be delivered and applied to the shared state in any order
- these messages be idempotent:
Last-Write-Wins strategy this seems to be super easy, if I just add a
client clock to the messages.
Digging through the literature, there seem to be a wealth of clocks like version vectors with client and server side update counters and also there are the novel ‘dot clocks’. When I read the papers about them I was tempted to use what is newest and shinier until I realized what suits me best.
On a sidenote: I find very little literature or blog posts give a good overview about when to use which and why. People seem to focus on their results and talk about that specific niche only. If you happen to be wandering in the land of these clocks I suggest to read Scalable and Accurate Causality Tracking for Eventually Consistent Stores by Paulo Sérgio Almeida, Carlos Baquero, Ricardo Gonçalves, Nuno Preguiça, and Victor Fonte.
Reading the paper I realized that my use case could be covered with a very simple solution. The nodes need to increment and pass a counter at each state change and send this over. This is very much like the VVclient approach in the paper.
Local Clock Module
GroupManager.Data.LocalClock represents the counter that is going to be attached to the information the node sends over. Only the given node is supposed to change the clock value. Everyone else are reading it.
If you haven’t read the previous posts in the series: I treat groups as a 32bit unsigned integer range from 0 to 0xffffffffff. The incoming requests are mapped to an integer in this range. The nodes decide what part of the range they want to serve and publish this information. Whenever they decide to serve an additional (sub)range or they want to stop serving a subrange they publish this information to the group. The GorupManager.Data.Item represents this information. It has:
member: identifies the node
op: is either :add or :rmv
The priority is a hint that comes into the picture when multiple nodes are serving the same range. This information is opaque from the group membership stand point. Nodes can use it to decide the order in which they participate in a workload. The idea is that
scalesmall should be able to support heterogenous nodes. The priority field could help in distinguishing between more or less reliable machines, different capacities and better or worst network connectivity.
The GroupManager.Data.TimedItem module represents and
Item at a given
LocalClock. This binding allows us to pick the last write of the given nodes. Items are treated as same when these fields match:
priority fields will be overwritten by the last write.
The GroupManager.Data.TimedSet module is a list of
TimedItems and only used for convenience. Its main use is to validate TimedItem members when added to the list.
The GroupManager.Data.WorldClock module is a list of
LocalClock items. It both validates
the LocalClock members when added and also overwrites the older LocalClock for the given member if present.
WorldClock is not used in comparisons. Its purpose is to tell nodes if they lag behind on updates from a given node. When this happens they should take action to gather the missing pieces either from the origin node itself or from the other nodes.
The GroupManager.Data.Message is the topmost module of the
Data lib hierarchy. It has a
WorldClock and a
WorldClock gets updated when new
TimedItem elements are added to the
- Ideas to experiment with
- More ideas and a first protocol that is not in use anymore
- Got rid of the original protocol and looking into CRDTs
- My first ramblings about function guards
- The group membership messages
- Design of a mixed broadcast
- My ARM based testbed
- Experience with defstruct, defrecord and ETS
- GroupManager code works, beta
- GroupManager more information and improvements