The following is a short interview with Succinctly series author Marc Clifton, whose latest book, The Kademlia Protocol Succinctly, was published recently. You can download the book here.
What should people know about the subject of your book? Why is it important?
There is a growing interest in decentralizing data stores—some motivations include “disaster proofing” by having redundant, recoverable data stored across many nodes as well as potentially improving security by spreading data across multiple nodes. Decentralization also helps with data access, both in performance and, again, because data is distributed, reducing data outages if a storage node goes down. In addition, while traditional blockchain technology used in cryptocurrency has required that each node in the network stores all the data, there is a growing number of blockchain implementations that overcome this problem, and Kademlia is one such protocol used in those systems. The Kademlia protocol has emerged as the defacto standard for many peer-to-peer systems and understanding the pros and cons of this protocol (and others) is important when deciding whether a decentralized data store is an appropriate solution.
When did you first become interested in this subject?
I have been interested in the underlying technologies of cryptocurrencies for a while now. Merkle trees, proof-of-work, proof-of-stake, consensus, smart contracts, and peer-to-peer distributed hash table (DHT) implementations are all pieces of the “knowledge puzzle.” Kademlia, after some research, was clearly the protocol of choice to dive into. Researching the protocol piqued my interest because what sounded like something simple to implement turned out to be much more complex. Doing a deep dive into the Kademlia protocol really helped me to understand many of the issues in implementing a peer-to-peer DHT as well as better understand exactly what problem it solves and what problem(s) it does not solve.
By writing this e-book, did you learn anything new yourself?
Absolutely! Besides learning about the nuts and bolts of the Kademlia protocol, I also learned how this protocol improved on existing peer-to-peer DHTs to solve certain problems related to performance and storage. I believe I have a deeper understanding of how to correctly and appropriately choose when to use a peer-to-peer network, particularly with regards to data security, vulnerabilities that the protocol itself introduces. Also, peer-to-peer networks can be difficult to test, especially as there are aspects of the protocol that are nondeterministic, such as randomly choosing a peer to initiate a refresh. Even randomly choosing the peer’s ID affects the “bucket” into which it is placed. While unit testing the protocol itself is straightforward, testing a peer-to-peer system requires setting up complex scenarios, simulating hundreds, if not thousands, of peers, and being able to create various load conditions that exercise the protocol’s peer and data caching mechanisms. This is not an easy task!
How will this subject change over the next few years?
I suspect that there will be a continuing interest in the benefits of decentralized data stores, but that this will be tempered by the continuing need for ensuring that data is secure and access to data is appropriately controlled. A decentralized data store can reduce single-point-of-failure vulnerabilities and improve data security by separating the data into packets that are distributed across many peers. That said, I think we will need to see these protocols develop such that they include “secure layers,” much as HTTPS is a secure layer on top of HTTP, regardless of whether a peer-to-peer network uses HTTPS as its transport layer. Of particular importance will be securing each and every node in the peer network—if a malicious attack gains access to a single peer, all the data across all the peers becomes vulnerable, and this may present itself to be a bigger problem than what we are currently dealing with regarding centralized stores. For example, S/Kademlia, which I have not discussed in the book, is an approach toward secure key-based routing.
Do you see the subject as part of a larger trend in software development?
I think there is a larger trend that can generally fit into the category of “decentralization.” This includes not just how data is stored, but how information is communicated. The trend toward microservices is an example of decentralizing an application’s code base such that instead of a centralized, monolithic implementation, there are services that can be distributed across systems for processing data. Certainly, the whole “cloud” phenomena is itself a step in decentralization where the data is separated from the application. I think it is a natural evolution to look at the benefits of decentralizing the individual “pieces” of data as well, and Kademlia is certainly one such protocol that facilitates that. As more and more data is gathered from our phones, IoT devices, wearable gadgets, and social media use, the mining of all this “big data” so that it actually serves a useful purpose will undoubtedly require further developments in decentralization, redundancy, security, and performance. Technologies like blockchain and smart contracts are examples of what I see as the tip of the iceberg in the trend toward protocols that facilitate managing data.
What other books or resources on this topic do you recommend?
I actually think Wikipedia is a great resource to start one’s journey on investigating peer-to-peer networks and distributed hash tables. And certainly “googling” a particular topic will lead one to various blogs, technical articles, and other discussions. However, one can also succumb to information overload. While researching the Kademlia protocol, I went down many rabbit holes to understand certain concepts (particularly the concept of caching data into three separate stores.). Regarding the Kademlia protocol itself, my limited research revealed that, at best, this protocol might be discussed as a chapter in a more general book about DHTs and peer-to-peer networks. I believe that this Syncfusion e-book is the first book to focus entirely on the protocol, providing (hopefully) a robust discussion, implementation, and suite of unit tests. That said, I’m a particular fan of O’Reilly press and would recommend perusing their books on peer-to-peer networking, blockchain, and Bitcoin.