Implementing In Memory Data Structures in Go

Karanbir Chahal
3 min readJul 12, 2018

This post talks about how the DHT is implemented for hydra. We are assuming that the reader is already aware of the Kademlia protocol. If not read this to get upto date.

Goals

We set out to construct the following components:

  1. An in memory data structure that is fault tolerant. It supports failure by maintaining a transaction log and periodic snapshotting. We shall go into the exact meaning of these two later in this post.
  2. A concurrent data structure that supports multiple requests.
  3. An in memory cache to keep track of dead nodes. Tjis cache helps reduce unnessesary network requests.

Let’s go into these 3 aspects of a DHT in detail now.

In memory Data Structure with Transaction Logging and Snapshotting

Kademlia requires one to keep a 256 key size key value map. Each value is a list of nodes that denote the nodes closest to the index’s address. Hence, we set out to create the very same.

We had to keep track of 3 key things however.

  1. The data structure should be saved to disk i.e it should be persistant. We do not want to loose the data that the node has collected in the event of a crash.
  2. The data structure should be concurrent. There could be multiple nodes trying to connect to the DHT. They should be served with minimum latency.

Transaction Logging

A transaction log is a append only text file that records all the operation that have been done to the data structure. The reason for keeping this file , is when you play back the instructions in the log file, you can arrive at the last seen state of the data structure. That’s why we need a transaction log. The log is an append only text file. It is a paradigmn used by major databaes like MongoDb , Redis and many more. It is a very standard thing done to maintain persistent data bases.

The exact schema for the log for Hydra has not been decided as of now. It would be of following type probably

push index1,node_value
add index1,index2,node_value

Periodic Snapshotting

We have one worrying scenario with this above approah. What happens when the transaction log grows uncontrollably. This could lead to an buffer overflow issue. To solve this, we need to keep clearing the transaction log, by saving the actual DHT to disk. Two pointers are maintained front and back.

  1. start — denotes log entry that is not covered by the most recent snapshot
  2. end — denotes end of transaction log.

During each snapshot operation, the start is brought to the end index position.

The saving to disk for DHT has not been implemented as yet. The page will be updated with technical details once it is.

Concurrency in the DHT

We approached this by doing two things.

  1. Each DHT 256 key index has a go routine listening for it. This decouples the DHT into 256 different parts. Now, the DHT can be modified if two different indexes are being operated upon independently.
  2. These go routines are fired when we pass the new node data using go channels. Once, in the go routine, the function checks for two things . If the size of the listwhich the go routine is responsible for is full or not. If full, we check our dead nodes cache to figure out if we have dead nodes. If yes, we simply replace the dead node with the new node value. If there are no dead nodes found, we ping all the nodes in the list to find out if dead nodes exist. If yes, as before we replace , if no , we discard this new node value.
  3. The cache is a simple in memory map data structure that is non persistant.
  4. The pings for all the nodes are done concurrently using go routines which are merged back after all the pings are completed. This is one using the go and select conurrency paradigms of golang.

This is why golang is a perfect language for building a system like this. It provides very helpful and easy to use concurrency primitives that make server / networking code really efficient.

Sample Code

The sample code for a concurrenct DHT described above is given below. Keep in mind that this code would probably not be used in the production build. It is just for teaching purposes.

Conclusion

  1. Keep data structure persistant by using transaction logs and snapshotting.
  2. An implementation on how a concurrent data structure could work.

--

--