The Kademlia Protocol Succinctly^®
by Marc Clifton

CHAPTER 2

Key Concepts

The Kademlia specification consists of several sub-algorithms:

Registering new peers
Updating peer lists
Obtaining the “closest peer” to a key (this concept is discussed in detail later)
Storing and retrieving key-values
Managing stale key-value pairs and peers

The most complex part of the code is in the registration of new peers, because this involves some magic numbers based on the Kademlia authors’ research into the performance of other networks, such as Chord[9] and Pasty,[10] and the behavior of peers in those networks.

Kademlia terminology

Terms used specifically in the Kademlia specification are described here.

Overlay network: An overlay network is one in which each node keeps a (usually partial) list of other nodes participating in the network.

Node: A node (also known as a contact) is a peer in the network.

Node ID: This is a 160-bit node identifier obtained from a SHA1 hash of some key, or is randomly generated.

k-Bucket: A collection of at most k nodes (or contacts), also simply called a bucket. Each node handles up to k contacts within a range of IDs. Initially, the ID range is the entire spectrum from 0 <= id <= 2¹⁶⁰ - 1.

Key-Value: Peers store values based on 160-bit SHA1 hashed keys. Each stored entry consists of a key-value pair.

Router: The router manages the collection of k-buckets, and also determines into which nodes a key-value should be stored.

Distance/Closeness: The distance between a host and the key is an XOR computation of the host’s ID with the key. Kademlia’s most significant feature is the use of this XOR computation to determine the distance/closeness between IDs.

Prefix: A prefix is the term used to describe the n most significant bits (MSB) of an ID.

Depth: The depth of a bucket is defined as the shared prefix of a bucket. Because buckets are associated with ranges from 2ⁱ to 2ⁱ⁺¹ - 1 where 0 <= i < 160, one could say that the depth of a bucket is 160 - i. We’ll see later that this may not be the case.

Bucket Split: A bucket split potentially happens when a node’s k-bucket is full—meaning it has k contacts—and a new contact with a given 160-bit key wants to register within the bucket’s range for that key. At this point, an algorithm kicks in that:

Under one condition, splits the bucket at the range midpoint into two ranges, placing contacts into the appropriate new buckets.
Under a second condition, splits the bucket when a specific depth qualifier is met.
Under a third condition, replaces a peer that no longer responds with the newest contact that is in a “pending” queue for that bucket.

Communication protocol

The Kademlia protocol consists of four remote procedure calls (RPCs). All RPCs require that the sender provides a random RPC ID, which must be echoed by the recipient: “In all RPCs, the recipient must echo a 160-bit random RPC ID, which provides some resistance to address forgery; PINGS can also be piggy-backed on RPC replies for the RPC recipient to obtain additional assurance of the sender’s network address.”

Anytime a peer is contacted with any of the four RPCs, it goes through the process of adding or updating the contact in its own list. The concept of “closeness” will be discussed in detail later.

Ping

“The PING RPC probes a node to see if it is online.” This is considered a “primitive” function, in that it just returns the random RPC ID that accompanied the Ping request.

Store

STORE instructs a node to store a (key,value) pair for later retrieval. This is also considered a “primitive” function, as it again just returns the random RPC ID that accompanied the STORE request. “To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them STORE RPCS.” The participant does this by inspecting its own k-closest nodes to the key.

FindNode

“FIND_NODE takes a 160-bit ID as an argument. The recipient of the RPC returns (IP address, UDP port, Node ID) triples for the k nodes it knows about closest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about).”

In an abstracted communication protocol, the recipient needs to return information about the protocol: the kind of protocol and whatever is required to contact a peer using that protocol. If multiple protocols are supported, we can consider two options:

Return node information only for the protocols that the requester says it supports.
Alternatively (and not as good an option), the requester can filter out returned nodes whose protocols aren’t supported.

Other considerations when supporting multiple protocols are:

The peer itself may support multiple protocols, so it should probably indicate what those are when it registers with another peer.
The peer may have a preferred protocol.

None of the issues of different protocols is discussed in the spec—this is purely my own enhancement.

The FindNode protocol has two purposes:

A peer can issue this RPC on contacts it knows about, updating its own list of “close” peers.
A peer may issue this RPC to discover other peers on the network.

FindValue

“FIND_VALUE behaves like FIND_NODE—returning (IP address, UDP port, Node ID) triples—with one exception. If the RPC recipient has received a STORE RPC for the key, it just returns the stored value.”

If the FindValue RPC returns a list of other peers, it is up to the requester to continue searching for the desired value from that list. Also, note this technique for caching key-values:

“To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs closest to the key. However, value lookups use FIND_VALUE rather than FIND_NODE RPCS. Moreover, the procedure halts immediately when any node returns the value. For caching purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the closest node it observed to the key that did not return the value.”

Other considerations

Expiration time

“Additionally, each node republishes (key,value) pairs as necessary to keep them alive, as described later in Section 2.5. This ensures persistence (as we show in our proof sketch) of the (key,value) pair with very high probability. For Kademlia’s current application (file sharing), we also require the original publisher of a (key,value) pair to republish it every 24 hours. Otherwise, (key,value) pairs expire 24 hours after publication, so as to limit stale index information in the system. For other applications, such as digital certificates or cryptographic hash to value mappings, longer expiration times may be appropriate.”

If we want to consider using Kademlia in a distributed ledger implementation, it seems necessary that key-values never expire—otherwise, this would result in an integrity loss of the ledger data.

Over-caching

“Because of the unidirectional nature of the topology, future searches for the same key are likely to hit cached entries before querying the closest node. During times of high popularity for a certain key, the system might end up caching it at many nodes. To avoid “over-caching,” we make the expiration time of a (key,value) pair in any node’s database exponentially inversely proportional to the number of nodes between the current node and the node whose ID is closest to the key ID. While simple LRU eviction would result in a similar lifetime distribution, there is no natural way of choosing the cache size, since nodes have no a priori knowledge of how many values the system will store.”

Bucket refreshes

“Buckets are generally kept fresh by the traffic of requests traveling through nodes. To handle pathological cases in which there are no lookups for a particular ID range, each node refreshes any bucket to which it has not performed a node lookup in the past hour. Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID.”

Joining a network

“To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary.”

Build apps 2X faster

using Syncfusion Essential Studio^® suite

1800+ high-performance UI components.
Includes popular controls such as Grid, Chart, Scheduler, and more.
24x5 unlimited support by developers.

Get Your Free Trial Now