CHAPTER 1
Kademlia, according to a paper[1] published in 2015 by Xing Shi Cai and Luc Devoyre, is “the de facto standard searching algorithm for P2P (peer-to-peer) networks on the Internet.” Kademlia is a protocol specification for decentralizing peer-to-peer network operations, efficiently storing and retrieving data across the network.
A peer-to-peer network has several positive aspects:
Kademlia was designed by Petar Maymounkov and David Mazières in 2002. Wikipedia says this about Kademlia:
“It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values (usually file hashes or keywords). In fact, the node ID provides a direct map to file hashes and that node stores information on where to obtain the file or resource.”[2]
Kademlia is used in file-sharing networks. For example, BitTorrent 8 uses a distributed hash table (DHT) based on an implementation of the Kademlia algorithm. The Kad network 9 uses the Kademlia protocol, with eMule being an open-source Windows client.
The underlying technology of not just cryptocurrency but any blockchain, including those that implement smart contracts,[3] must include a P2P-distributed hash table, at least in regard to how blockchain technology is being discussed and applied. (Using a blockchain in a centralized scenario is sort of pointless, except perhaps for logging purposes.) Understanding how P2P protocols work is important, as blockchain is one of those revolutionary technologies that already has, and will continue to have, an impact on software application development.
Many people think that centralized data, except for performance reasons, is on its way out.[4] As that last link states: “The more the data management industry consolidates, the more opposing forces decentralize the market.” And peer-to-peer decentralizing has built in redundancy protecting from single-point data loss and access failures. Not that decentralizing doesn’t have its own problems—security will probably be the main one, if it isn’t already.
In recognition that there are some interesting and complicated cryptocurrency and blockchain technologies coming down the road, and that these need to be understood, protocols like Kademlia are a good starting point for looking at any P2P DHT implementation. As to why Kademlia specifically, the summary to the Kademlia specification says it best:
“With its novel XOR-based metric topology, Kademlia is the first peer-to-peer system to combine provable consistency and performance, latency-minimizing routing, and a symmetric, unidirectional topology. Kademlia furthermore introduces a concurrency parameter, α, that lets people trade a constant factor in bandwidth for asynchronous lowest-latency hop selection and delay-free fault recovery. Finally, Kademlia is the first peer-to-peer system to exploit the fact that node failures are inversely related to uptime.”
As Petar Maymounkov, one of the co-creators of Kademlia, says: “It is a Turkish word for a lucky man’ and, more importantly, is the name of a mountain peak in Bulgaria.”
In perusing GitHub repositories, I found many implementations that were incomplete or clearly buggy, often discoverable simply by inspecting the code. To complicate matters further, there appear to be two different versions of the Kademlia protocol. One is a very short specification that omits much of the discussion of routing tables and performance improvements, while the longer specification on the MIT website[5] is apparently the “correct” version. The shorter specification appears to have been removed from the Rice University website.
Two implementations stand out as reasonable references:
When reviewing an open-source implementation, it is recommended that you inspect the code to verify that it implements the optimizations described in the longer specification.
I have not looked carefully at implementations in languages other than C# and Python. There are implementations in many other languages, those being primarily written in Java, Javascript, and Go. Looking briefly at implementations in these other languages, it’s fairly easy to tell which version of the specification they implement, so again, beware that depending on which specification was used, you can have very different implementations. For example, an implementation in Java makes a very specific (yet seemingly arbitrary) rule about bucket splitting (we’ll get to that) that isn’t found in the spec.
The code implemented in this book requires:
A working knowledge of LINQ is also highly recommended.
I used several resources in the research for this book:
If you’re interested, some additional resources to look further into P2P systems, performance, distributed ledgers, and security can be found here:
The Kademlia specification discusses how peers communicate, find each other, and share and retrieve data. There are many concerns that still need to be addressed when creating a P2P network. Here is a list of some of the major concerns (the terminology and issues will become clearer later, so consider coming back to this section once you’ve worked through the material):
The goal in this book is to provide a detailed discussion of an implementation of the Kademlia specification. Specifically, we will:
Some unit tests set up specific conditions for testing code. Others use randomly generated IDs (node IDs and keys) and verify results through a different implementation of the same algorithm. To ensure repeatability of those tests, the Random class is seeded in debug mode with the same value, and is exposed as a public static field so that some unit tests can perform their tests with a range of seeds.
Code Listing 1: Random Seeding for Unit Testing
#if DEBUG |
Also, most of these unit tests are really system-level tests, or at least partial-system tests. With actual unit tests of specific code sections in a particular method, well, it gets inane rather quickly. So you’ll see a lot of setup stuff being done in the higher-level tests.
The unit tests presented in this book are important not just because they test the underlying implementation, but also because they demonstrate how to set up scenarios for testing the Kademlia protocol under specific conditions. Thoroughly understanding the unit tests is probably an even better way of understanding the Kademlia protocol than looking at the implementation!
The code presented here is incremental, meaning that as additional features are discussed and added, code is updated to reflect those new features and their unit tests. Not every refactoring is included in the code here; therefore, it is strongly recommended that if you’re interested in a particular method, you review the final implementation in the actual code base.
A GitHub repository for the code base can be found here.
This book contains numerous excerpts from the specification Kademlia: A Peer-to-peer Information System Based on the XOR Metric. These are clearly indicated in “quoted” italicized text.