Long-time Radix community members (some with us since before it was called Radix!) know that many years of hard work have brought us to where we are today. We unveiled our most recent milestone with the release of the Cerberus Whitepaper describing our breakthrough multi-shard consensus protocol that enables the unprecedented level of DLT scalability that has always been our goal.
Prior to Cerberus, many of you will remember (or have read about) the release of the Tempo whitepaper, our previous DLT consensus protocol design. The ideas contained in that whitepaper drew many of us to Radix as fans, contributors, and employees. As Radix continues to drive toward the open source release of the Radix public network, we are gaining a lot of new followers (and returning friends) who want to know what Tempo was, and why we moved to our new Cerberus consensus.
Meaningful technology development is nearly always a long, winding road. So in this blog we want to share our journey from our Founder Dan’s quest to solve the shortcomings of blockchain, to the development of Tempo, through the issues we uncovered in testing, and how we solved them with Cerberus. The collective challenges, dead ends, and new paths discovered along our road have all led to a solution that we firmly believe is ready to achieve Dan’s vision: to deliver on the promise of blockchain with a truly global-scale decentralized ledger.
The Origin Story
The vision for Tempo was born out of a struggle to get Bitcoin to scale. Dan realised that while decentralised finance was the future, current blockchain technology couldn’t bring it to the masses. The enticingly simply blockchain model that networks like Bitcoin were built on just couldn’t be extended to scale without compromising on decentralization. Ethereum, being based on fundamentally the same technology, didn’t fare much better. Directed acyclic graphics - or DAGs - seemed like a promising start, but also fell short in both scalability and centralisation: something else was needed.
Let’s wind back to a time before blockchain - distributed systems have been studied and used in production for decades, so why the desire for a drastically different approach? While distributed systems have a long history, the traditional approach to distributed systems focuses on their application in closed environments (where networks are smaller and do not have the potential for many attackers). When used in open settings with many nodes, the traditional approaches would either slow to a crawl or cease to function completely.
These approaches therefore weren’t well suited for our vision either. To fulfill the promise of a scalable, decentralised ledger, we had to go back to the drawing board. An ideal public ledger for the entire world to use would need to be:
- Massively scalable to accommodate millions of requests per second
- Decentralised to ensure the world’s infrastructure remains in control of the people
- Programmable to enable creation of decentralized, autonomous applications
All existing solutions fell short in at least one key area. To come up with an all-in-one solution, we began by reconsidering the fundamentals. By translating these ideal properties into technical requirements, we could define what such a platform might look like:
- Massive scale would require both a scalable data architecture and a scalable consensus process. Neither the existing blockchain approaches nor classical consensus mechanisms (as used for company databases) were scalable while being sufficiently tolerant to the kinds of faults expected in a public setting.
- Decentralisation would require that any number of nodes could join (or leave) the network at any time. While blockchains and DAGs could handle this, they weren’t scalable. Traditional consensus mechanisms could support neither goal.
- Programmability would require a new paradigm for developing secure and safely scalable distributed applications that would be supported by both the scalable data architecture and consensus mechanism.
As all existing solutions seemed inadequate for meeting even the first two of our ambitions - scale and decentralisation - we started with first principles: Distributed systems replicate state across a set of nodes. The value of a distributed system lies in its capacity to behave as a single unit to the outside observer. That is, it should not matter which nodes a client talks to and asks about some state; they should all give the same answer.
To accomplish this, the distributed nodes need to remain synchronised by talking to each other. The main bottleneck in most distributed systems is therefore inter-node communication. To scale to high demand (serving more and more requests) making individual nodes more powerful soon becomes inefficient (and centralised). Instead, more nodes are added that run in parallel. These new nodes need to talk to all other nodes to remain synchronised and keep acting as a cohesive whole.
That fundamentally limits scalability. Naturally, the way to make such systems scale is by finding a way to require less inter-node communication (talking). But how? One traditional approach is to make nodes only talk to some other nodes for every request, assigning different parts of the ledger to different nodes. Now nodes only need to bother with requests for the part of the ledger they’re responsible for. Less nodes to talk to for every request equals more scalability. This “splitting” is a form of sharding, or breaking the ledger into individual shards for different sets of nodes to deal with. Sharding in various forms had been used for decades, so it was a good starting point.
But we wanted more. To serve the world, our platform would need all the scalability it could get.
Another reason nodes are so eager to talk to each other is that pBFT-style consensus protocols require all nodes to agree to every request before moving on to the next. This incurs substantial overhead. Even with sharding, nodes need to talk to all other nodes relevant for every request. A potential but exotic solution to this problem lies in a simple realisation: what if nodes just accepted every request and only came to explicit agreement with another when a request is disputed? Such a lazy communication paradigm (also referred to as lazy consensus) would substantially reduce the communication required during normal operation of the ledger. Combining lazy consensus with sharding had the potential for unparalleled scalability.
The idea for a sharded ledger with lazy consensus was realised with Tempo. Tempo is both a distributed ledger and the underlying consensus protocol. The consensus part of Tempo is based on the notion of logical clocks. Every node has a counter (its logical clock) which increases with every new request it witnesses, never decreasing. When storing a request, the node attaches its current logical clock value to that event before propagating it further. The set of logical clock values assigned by each node forms a temporal proof.
Temporal proofs become useful in a conflict, when a node receives a request that conflicts with a request it had received earlier. To determine which request to discard, the node can then collect temporal proofs from other nodes that are related to both of its conflicting requests. When an intersection is found in a third temporal proof, this proof can be used to determine a partial ordering between the two requests - whichever came first wins. Since other nodes will find the same temporal proofs for the same requests, all nodes will come to the same conclusion on which request came first. Consensus is reached, all without the overhead of preemptively talking to all other nodes all the time.
Nodes are required to cryptographically sign their contributions to a temporal proof. However, that alone does not prevent faulty nodes from assigning the same logical clock value to different requests. For that, Tempo must somehow verify that nodes are being honest with their temporal proofs. To that end, nodes include a commitment to their recently witnessed requests with every contribution they make to a temporal proof. A commitment is a cryptographic hash that makes it easier for honest nodes to detect any wrongdoing. Every contribution includes a different (but predictable) small piece of that hash, further decreasing messaging overhead.
With consensus tuned for maximum scalability, we could turn to the other key ingredient for large-scale scalability: sharding. At the time Tempo was designed, most distributed ledgers did not use sharding at all. And the few that did took the approach of adjusting the number of shards dynamically according to throughput requirements. Dynamic sharding however causes significant consensus and synchronisation problems when nodes are eventually reassigned as the sharding configuration changes, nevermind the usability and programmability difficulties.
To avoid the problem of having to continuously reconfigure the entire sharding configuration, we chose to “preshard” the ledger into a massive number of shards which never changes. Every node was assigned a subset of these shards, enabling smooth adjustments tailored to every node’s individual capacity and requirements. Designing Tempo around this massive presharding, rather than trying to incrementally add sharding as in most approaches, provided an excellent foundation to achieve virtually unlimited parallelization of transaction processing.
On the 12th of June 2019 we decided to see just how fast we could push a network using Tempo. That day we publicly replayed the entire 10 years of Bitcoin’s transaction history onto the Radix ledger (as if submitting those transactions to Radix as quickly as possible), with full transaction and signature validation, on a network of over 1,000 Nodes distributed evenly throughout the world. This test network achieved over 1M transactions per second, demonstrating what parallelization across a massively presharded ledger is capable of.
For the first time since the creation of public, trustless networks, we demonstrated a technology that could truly support even the world’s most demanding transactional applications using. It was an incredible piece of engineering that would not have been possible without both the huge dedication of our own team, as well as working directly with Google Cloud engineers to make it all possible.The fully documented journey and testing is covered in much more detail in these four blog posts, which we encourage you to read if you have time!
- 10 years of Bitcoin history, replayed in under 30 minutes
- Part 1: A Primer on The Scalability Test and Radix
- Part 2: How We Actually Built The Scalability Test
- Scaling DLT to over 1M TPS on Google Cloud
For all the innovations Tempo brought to the table, a few fundamental questions remained. Tempo’s lazy consensus mechanism meant that a request could be challenged and overruled by another request at any time. That is, the initial Tempo model never guaranteed finality of any request. The only way to be sure that a request would persist was to know about all relevant temporal proofs. However, network faults and malice can lead to some parts of temporal proofs being inaccessible and therefore make it impossible to accurately assess finality.
Clients want some form of finality in their requests. Bitcoin-style consensus provides increasing (although never absolute) certainty in finality, but this approach was incompatible with Tempo. One possible solution to introduce finality was to introduce a timeout mechanism which would finalise requests in a node after it was left undisputed for a predetermined length of time. Unfortunately such a solution is vulnerable to the same flaws, albeit expressed in a different way. In certain network configurations, it was possible for some nodes to miss relevant conflicting requests before finalising their favourite. This would lead to nodes disagreeing with another but unable to rectify the conflict when it is eventually discovered since requests were already finalised on both ends.
Different nodes finalising conflicting requests breaks the model of a “cohesive whole” as required for a reliable distributed system. This was a breaking flaw. An unreliable network, rather than malicious actors, was enough to break Tempo’s guarantee of coherence. As nodes may for instance disagree where you sent your money to in perfectly benign (albeit unreliable) networks, Tempo consensus wasn’t fit for real world use.Back to fundamentals, again. Finality, the key component missing from the lazy approach to consensus of Tempo, requires nodes to proactively come to consensus on every request. How else might we reduce the communication complexity inherent to this “active” form of consensus?
The communication complexity is so high because all nodes need to be absolutely sure that they’re aligned with all other nodes. What if instead of pursuing a 0% probability of nodes disagreeing at the cost of high communication we could achieve a 0.0...1% probability of disagreement at a much lower cost?
The advantages of accepting an extremely low probability of failure are realised by randomised consensus (a technique currently being popularised by consensus algorithms such as Avalanche). The idea is that instead of talking to all nodes for consensus on each request, a node conducts a couple rounds of “interviews” with small random subsets of nodes. If this subset of nodes disagree with a node’s choice of request, the node will accept the majority choice. After a certain number of consecutive rounds without changing its choice of request, a node will assume all nodes to be in agreement and finalise that request. With the right configuration, randomised consensus can guarantee very low probabilities of disagreement.
The notion of randomised consensus has enjoyed a recent surge in popularity as other distributed ledgers looked to it to aid in their own scalability problems. However, as with the original vision for Tempo, the very premise underlying the scalability of randomised consensus comes with a key trade-off: letting nodes vote in an unpredictable manner makes fault detection difficult because it is much more difficult to prove misbehaviour algorithmically. Without sufficient fault detection, faulty nodes can make other nodes disagree with one another. And due to its “go with the majority” interview approach, faulty nodes can lead to nodes never coming to an agreement, or - worse - nodes finalising disagreements (like in Tempo).
When considering sharding, randomised consensus does not seem quite as attractive anymore. The savings in communication overhead afforded by randomised consensus scale exponentially with the number of nodes involved in consensus; but in a sharded ledger, only a subset of all nodes are involved in any single consensus decision. This diminishes the scalability benefits promised by randomised consensus so that it would be more “economical” (from a scalability standpoint) to simply use a pBFT-style consensus mechanism and then boost its performance with sharding.
What if we could take all the lessons we had learned about scaling consensus, ledgers and sharding and apply them to the latest advances in consensus research? The result could be a fresh approach to highly scalable ledgers, rooted in a rock-solid and well-researched foundation. And that’s exactly what we did. Introducing: Cerberus.
Onwards with Cerberus
Cerberus is the product of combined decades of research and experimentation in both exotic and pBFT-style approaches to consensus, ledger architectures, and distributed applications – combined with the insights we gained designing Tempo for large-scale permissionless networks. The essence of Cerberus is simple: take an extensible pBFT-style consensus algorithm and extend it to enable unlimited parallelism using our unique sharding concept.
The foundation for this next generation of consensus protocol had to be theoretically sound and, of course, performant. For sufficient base level performance, the protocol would have to drive down communication complexity in some way. The classic way to reduce communication is by nominating a leader for each request who will ensure that all nodes agree on that request before moving on. Most leader-based consensus protocols elect a new leader only when the incumbent becomes unresponsive or otherwise fails to complete requests. This special case treatment of leader changes leads to extra complexity and less predictable performance.
HotStuff is a recently published consensus protocol that builds on significant earlier work (tracing its history back to pBFT) but changes leaders with every request to achieve substantial advantages. This streamlined consensus process enables requests to be chained together, enabling a significant performance optimisation.Importantly, HotStuff’s streamlined approach also enables a key innovation of Cerberus: running multiple “instances” of consensus in parallel for scalability while retaining the ability to transparently operate on them.
As with the original Tempo, the underlying ledger is massively presharded. Each shard is assigned its own independent consensus instance enabling virtually unlimited parallelism.
To enable consensus across multiple shards/instances, Cerberus implements a novel “braiding” mechanism which ties multiple HotStuff-style consensus instances together on-demand to drive cross-shard requests. Such braids are tied and untied as needed, enabling transactions to be atomic across shards, which is critical for applications like decentralised finance (e.g. when sending money between shards). Cerberus thus avoids the problems of many sharding approaches wherein shards conduct consensus completely independently and may only coordinate across shards with multi-step messaging.
Cerberus’ unique approach to massive sharding of both the ledger and the consensus protocol enables unparalleled scalability while remaining flexible for developers and without sacrificing decentralization.