Simple Tcp Message Performance In Elixir
I am still learning Elixir. I have programmed in a number of languages and learning new syntax is not a big deal for me. What I am more interested in is to understand what is the problem domain where the given language fits better.
I am interested in distributed, networked problems where Elixir is said to be good. I want to make a few experiments to see what I can expect. In this post I create a simple Elixir TCP server that receives small messages. These messages will have these 3 fields:
- a 64 bit ID
- a 32 bit size field that tells who many bytes the payload has
- payload
I expect my TCP server to send back the message ID as an acknowledgement. I will create a C++ client to send the messages and check the ACKs. They will operate in lock-step, so the C++ client will send the new message when it verified the acknowledgment.
Update: you may be also interested in the next three posts in this series:
- 100k messages per second achieved by the introduction of reply throttling
- 250k messages per second achieved by better use of Elixir pattern matching
- over 2M messages per second achieved by removing the usage of the Task module
Measurement
I am interested to see how many messages will go through this setup per second on average. I don’t want to create any scientific measurement, neither want to compare this with other languages or solutions. I already know that this solution is not good. I only want a rough figure how would this perform on my laptop’s loopback network.
The Elixir server
I will use the ranch Erlang library for this experiment, although the default gen_tcp module would work perfectly here. I want to build on my previous experiments.
I created the project by mix new EchoPerf1 --sup --module EchoPerf1
.
The mix.exs file has:
defmodule Echoperf1.Mixfile do
use Mix.Project
def project do
[app: :echoperf1,
version: "0.0.1",
elixir: "~> 1.0",
build_embedded: Mix.env == :prod,
start_permanent: Mix.env == :prod,
deps: deps]
end
def application do
[
applications: [:logger, :ranch],
mod: {EchoPerf1, []}
]
end
defp deps do
[{:ranch, "~> 1.1"}]
end
end
lib/echoperf1.ex
has:
defmodule EchoPerf1 do
use Application
def start(_type, _args) do
import Supervisor.Spec, warn: false
children = [ worker(EchoPerf1.Worker, []) ]
opts = [strategy: :one_for_one, name: EchoPerf1.Supervisor]
Supervisor.start_link(children, opts)
end
end
lib/echoperf1_worker.ex
has:
defmodule EchoPerf1.Worker do
def start_link do
opts = [port: 8000]
{:ok, _} = :ranch.start_listener(:EchoPerf1, 10, :ranch_tcp, opts, EchoPerf1.Handler, [])
end
end
The real work is done in lib/echoperf1_handler.ex
:
defmodule EchoPerf1.Handler do
def start_link(ref, socket, transport, opts) do
pid = spawn_link(__MODULE__, :init, [ref, socket, transport, opts])
{:ok, pid}
end
def init(ref, socket, transport, _Opts = []) do
:ok = :ranch.accept_ack(ref)
loop(socket, transport)
end
def loop(socket, transport) do
case transport.recv(socket, 12, 5000) do
{:ok, id_sz_bin} ->
<< id :: binary-size(8), sz :: size(32) >> = id_sz_bin
case transport.recv(socket, sz, 5000) do
{:ok, _ } -> # data
transport.send(socket, id)
loop(socket, transport)
{:error, :closed} ->
:ok = transport.close(socket)
{:error, :timeout} ->
:ok = transport.close(socket)
{:error, _} -> # err_message
:ok = transport.close(socket)
_ ->
:ok = transport.close(socket)
end
_ ->
:ok = transport.close(socket)
end
end
end
I appreciate Elixir’s simplicity and robustness. While I was creating this experiment I tested a lot with telnet and sent garbage to this server. Everything I tried worked sensibly and made perfect sense. This BTW is one of the outcomes that I was shooting at when designed this experiment.
C++ client
I chose C++ to be the client because this is my primary language. I know what should be the performance when I write something into the code. This allows me to better understand the Elixir side too. If both were written in Elixir I would be in a very unfamiliar place.
The C++ client code:
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/uio.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <iostream>
#include <functional>
#include <cstdint>
#include <chrono>
namespace
{
struct on_destruct
{
std::function<void()> fun_;
on_destruct(std::function<void()> fun) : fun_(fun) {}
~on_destruct() { fun_(); }
};
struct timer
{
typedef std::chrono::high_resolution_clock highres_clock;
typedef std::chrono::time_point<highres_clock> timepoint;
timepoint start_;
uint64_t iteration_;
timer(uint64_t iter) : start_{highres_clock::now()}, iteration_{iter} {}
~timer()
{
using namespace std::chrono;
timepoint now{highres_clock::now()};
uint64_t usec_diff = duration_cast<microseconds>(now-start_).count();
double call_per_ms = iteration_*1000.0 / ((double)usec_diff);
double call_per_sec = iteration_*1000000.0 / ((double)usec_diff);
double us_per_call = (double)usec_diff / (double)iteration_;
std::cout << "elapsed usec=" << usec_diff
<< " avg(usec/call)=" << us_per_call
<< " avg(call/msec)=" << call_per_ms
<< " avg(call/sec)=" << call_per_sec
<< std::endl;
}
};
}
int main(int argc, char ** argv)
{
try
{
// create a TCP socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if( sockfd < 0 )
{
throw "can't create socket";
}
on_destruct close_sockfd( [sockfd](){ close(sockfd); } );
// server address (127.0.0.1:8000)
struct sockaddr_in server_addr;
::memset(&server_addr, 0, sizeof(server_addr));
server_addr.sin_family = AF_INET;
server_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
server_addr.sin_port = htons(8000);
// connect to server
if( connect(sockfd, (struct sockaddr *)&server_addr, sizeof(struct sockaddr)) == -1 )
{
throw "failed to connect to server at 127.0.0.1:8000";
}
// prepare data
char data[] = "Hello";
uint64_t id = 0;
uint32_t len = htonl(5);
struct iovec data_iov[3] = {
{ (char *)&id, 8 }, // id
{ (char *)&len, 4 }, // len
{ data, 5 } // data
};
for( int i=0; i<100; ++i )
{
timer t(10000);
// send data in a loop
for( id = 0; id<10000; ++id )
{
if( writev(sockfd, data_iov, 3) != 17 ) throw "failed to send data";
uint64_t response = 0;
if( recv(sockfd, &response, 8, 0) != 8 ) throw "failed to receive data";
if( response != id ) throw "invalid response received";
}
}
}
catch( const char * msg )
{
perror(msg);
}
return 0;
}
I built the client on Mac OSX by running g++ -o EchoCpp1 -O3 -std=c++11 -Wall EchoCpp1.cc
.
The results
I ran this on a 2015 MacBook Air.
elapsed usec=433476 avg(usec/call)=43.3476 avg(call/msec)=23.0693 avg(call/sec)=23069.3
elapsed usec=450325 avg(usec/call)=45.0325 avg(call/msec)=22.2062 avg(call/sec)=22206.2
elapsed usec=442094 avg(usec/call)=44.2094 avg(call/msec)=22.6196 avg(call/sec)=22619.6
elapsed usec=436530 avg(usec/call)=43.653 avg(call/msec)=22.9079 avg(call/sec)=22907.9
elapsed usec=447470 avg(usec/call)=44.747 avg(call/msec)=22.3479 avg(call/sec)=22347.9
elapsed usec=448915 avg(usec/call)=44.8915 avg(call/msec)=22.2759 avg(call/sec)=22275.9
elapsed usec=451250 avg(usec/call)=45.125 avg(call/msec)=22.1607 avg(call/sec)=22160.7
Roughly 22k message roundtrips per second.
Conclusion
Being honest I was hoping that this is going to be faster. But I am not disappointed by the results. My next experiment will optimize the protocol because I suspect this lock step order is not a good fit for Elixir. Message throttling and asynch acknowledgement would help a lot, but yet to be tested.
For a few observations about local messaging performance here is an older post.