CHAPTER 2
The Kademlia specification consists of several sub-algorithms:
The most complex part of the code is in the registration of new peers, because this involves some magic numbers based on the Kademlia authors’ research into the performance of other networks, such as Chord[9] and Pasty,[10] and the behavior of peers in those networks.
Terms used specifically in the Kademlia specification are described here.
Overlay network: An overlay network is one in which each node keeps a (usually partial) list of other nodes participating in the network.
Node: A node (also known as a contact) is a peer in the network.
Node ID: This is a 160-bit node identifier obtained from a SHA1 hash of some key, or is randomly generated.
k-Bucket: A collection of at most k nodes (or contacts), also simply called a bucket. Each node handles up to k contacts within a range of IDs. Initially, the ID range is the entire spectrum from 0 <= id <= 2160 - 1.
Key-Value: Peers store values based on 160-bit SHA1 hashed keys. Each stored entry consists of a key-value pair.
Router: The router manages the collection of k-buckets, and also determines into which nodes a key-value should be stored.
Distance/Closeness: The distance between a host and the key is an XOR computation of the host’s ID with the key. Kademlia’s most significant feature is the use of this XOR computation to determine the distance/closeness between IDs.
Prefix: A prefix is the term used to describe the n most significant bits (MSB) of an ID.
Depth: The depth of a bucket is defined as the shared prefix of a bucket. Because buckets are associated with ranges from 2i to 2i+1 - 1 where 0 <= i < 160, one could say that the depth of a bucket is 160 - i. We’ll see later that this may not be the case.
Bucket Split: A bucket split potentially happens when a node’s k-bucket is full—meaning it has k contacts—and a new contact with a given 160-bit key wants to register within the bucket’s range for that key. At this point, an algorithm kicks in that:
The Kademlia protocol consists of four remote procedure calls (RPCs). All RPCs require that the sender provides a random RPC ID, which must be echoed by the recipient: “In all RPCs, the recipient must echo a 160-bit random RPC ID, which provides some resistance to address forgery; PINGS can also be piggy-backed on RPC replies for the RPC recipient to obtain additional assurance of the sender’s network address.”
Anytime a peer is contacted with any of the four RPCs, it goes through the process of adding or updating the contact in its own list. The concept of “closeness” will be discussed in detail later.
“The PING RPC probes a node to see if it is online.” This is considered a “primitive” function, in that it just returns the random RPC ID that accompanied the Ping request.
STORE instructs a node to store a (key,value) pair for later retrieval. This is also considered a “primitive” function, as it again just returns the random RPC ID that accompanied the STORE request. “To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them STORE RPCS.” The participant does this by inspecting its own k-closest nodes to the key.
“FIND_NODE takes a 160-bit ID as an argument. The recipient of the RPC returns (IP address, UDP port, Node ID) triples for the k nodes it knows about closest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about).”
In an abstracted communication protocol, the recipient needs to return information about the protocol: the kind of protocol and whatever is required to contact a peer using that protocol. If multiple protocols are supported, we can consider two options:
Other considerations when supporting multiple protocols are:
None of the issues of different protocols is discussed in the spec—this is purely my own enhancement.
The FindNode protocol has two purposes:
“FIND_VALUE behaves like FIND_NODE—returning (IP address, UDP port, Node ID) triples—with one exception. If the RPC recipient has received a STORE RPC for the key, it just returns the stored value.”
If the FindValue RPC returns a list of other peers, it is up to the requester to continue searching for the desired value from that list. Also, note this technique for caching key-values:
“To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs closest to the key. However, value lookups use FIND_VALUE rather than FIND_NODE RPCS. Moreover, the procedure halts immediately when any node returns the value. For caching purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the closest node it observed to the key that did not return the value.”
“Additionally, each node republishes (key,value) pairs as necessary to keep them alive, as described later in Section 2.5. This ensures persistence (as we show in our proof sketch) of the (key,value) pair with very high probability. For Kademlia’s current application (file sharing), we also require the original publisher of a (key,value) pair to republish it every 24 hours. Otherwise, (key,value) pairs expire 24 hours after publication, so as to limit stale index information in the system. For other applications, such as digital certificates or cryptographic hash to value mappings, longer expiration times may be appropriate.”
If we want to consider using Kademlia in a distributed ledger implementation, it seems necessary that key-values never expire—otherwise, this would result in an integrity loss of the ledger data.
“Because of the unidirectional nature of the topology, future searches for the same key are likely to hit cached entries before querying the closest node. During times of high popularity for a certain key, the system might end up caching it at many nodes. To avoid “over-caching,” we make the expiration time of a (key,value) pair in any node’s database exponentially inversely proportional to the number of nodes between the current node and the node whose ID is closest to the key ID. While simple LRU eviction would result in a similar lifetime distribution, there is no natural way of choosing the cache size, since nodes have no a priori knowledge of how many values the system will store.”
“Buckets are generally kept fresh by the traffic of requests traveling through nodes. To handle pathological cases in which there are no lookups for a particular ID range, each node refreshes any bucket to which it has not performed a node lookup in the past hour. Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID.”
“To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary.”