Scalesmall W4 Message Contents Finalized
This week I further clarified what I want from the group membership messages and also implemented them. I added unit tests for all functions, so I feel more or less safe to continue with the network communication part next week.
I am seriously considering to add property based tests as well. Only that I don’t have the experience in this, so this will need to wait a bit. (With so much time spent on the theory I am getting eager to see my stuff to do something useful.)
Requirements
I decided that in scalesmall
the only one who makes a decision about the group membership and its participation in the shared work is the node itself. Thus only the given node writes
the shared information about its participation and everyone else are readers
.
This means that I only need to make sure that:
- whenever the node wants to notify the group about its decision, it eventually reaches all interested parties
- when the messages eventually arrive, the recipients must be able to correctly combine them which boils down to:
- these messages be idempotent:
A + A = A
, so they may be delivered multiple times - the messages be commutative:
A + B = B + A
and associativeA + (B + C) = (A + B) + C
, so they can be delivered and applied to the shared state in any order
- these messages be idempotent:
Using a Last-Write-Wins
strategy this seems to be super easy, if I just add a client clock
to the messages.
Clocks
Digging through the literature, there seem to be a wealth of clocks like version vectors with client and server side update counters and also there are the novel ‘dot clocks’. When I read the papers about them I was tempted to use what is newest and shinier until I realized what suits me best.
On a sidenote: I find very little literature or blog posts give a good overview about when to use which and why. People seem to focus on their results and talk about that specific niche only. If you happen to be wandering in the land of these clocks I suggest to read Scalable and Accurate Causality Tracking for Eventually Consistent Stores by Paulo Sérgio Almeida, Carlos Baquero, Ricardo Gonçalves, Nuno Preguiça, and Victor Fonte.
Reading the paper I realized that my use case could be covered with a very simple solution. The nodes need to increment and pass a counter at each state change and send this over. This is very much like the VVclient approach in the paper.
Local Clock Module
GroupManager.Data.LocalClock represents the counter that is going to be attached to the information the node sends over. Only the given node is supposed to change the clock value. Everyone else are reading it.
LocalClock tests can be found here.
Item Module
If you haven’t read the previous posts in the series: I treat groups as a 32bit unsigned integer range from 0 to 0xffffffffff. The incoming requests are mapped to an integer in this range. The nodes decide what part of the range they want to serve and publish this information. Whenever they decide to serve an additional (sub)range or they want to stop serving a subrange they publish this information to the group. The GorupManager.Data.Item represents this information. It has:
member
: identifies the nodeop
: is either :add or :rmvstart_range
end_range
priority
The priority is a hint that comes into the picture when multiple nodes are serving the same range. This information is opaque from the group membership stand point. Nodes can use it to decide the order in which they participate in a workload. The idea is that scalesmall
should be able to support heterogenous nodes. The priority field could help in distinguishing between more or less reliable machines, different capacities and better or worst network connectivity.
TimedItem Module
The GroupManager.Data.TimedItem module represents and Item
at a given LocalClock
. This binding allows us to pick the last write of the given nodes. Items are treated as same when these fields match:
member
start_range
end_range
The op
and priority
fields will be overwritten by the last write.
TimedItem tests can be found here.
TimedSet Module
The GroupManager.Data.TimedSet module is a list of TimedItems
and only used for convenience. Its main use is to validate TimedItem members when added to the list.
TimedSet tests can be found here.
WorldClock Module
The GroupManager.Data.WorldClock module is a list of LocalClock
items. It both validates the LocalClock
members when added and also overwrites the older LocalClock for the given member if present. WorldClock
is not used in comparisons. Its purpose is to tell nodes if they lag behind on updates from a given node. When this happens they should take action to gather the missing pieces either from the origin node itself or from the other nodes.
WorldClock tests can be found here.
Message Module
The GroupManager.Data.Message is the topmost module of the Data
lib hierarchy. It has a WorldClock
and a TimedSet
. The WorldClock
gets updated when new TimedItem
elements are added to the Message
.
Message tests can be found here.
Episodes
- Ideas to experiment with
- More ideas and a first protocol that is not in use anymore
- Got rid of the original protocol and looking into CRDTs
- My first ramblings about function guards
- The group membership messages
- Design of a mixed broadcast
- My ARM based testbed
- Experience with defstruct, defrecord and ETS
- GroupManager code works, beta
- GroupManager more information and improvements