Radix Technical AMA with Founder Dan Hughes – 16th March 2021

Radix Technical AMA
Radix DLT
24th March 2021

Every two weeks, Radix DLT’s Founder Dan Hughes hosts a technical ask-me-anything (AMA) session on the main Radix Telegram Channel. This is an opportunity for the community to ask Dan questions on tech developments, project milestones, or anything else related to the cutting-edge innovations of Radix.

Thank you to everyone who submitted questions and joined our AMA. You can find the full transcript below from the AMA sessions on March 16th 2021.

I recently revisited the consensus roadmap on GitHub. I seem to remember it used to say PoS and/or DPoS for Sybil protection for RPN-2+. Now it only says DPoS?

For Olympia (the first mainnet) it’s DPoS, for later mainnets it’s PoS/DPoS.

Reason being that if you don’t wish to maintain a validator, you can still delegate stake to a validator in order to help secure the network.

Can you please confirm that stake weight along with other factors including availability,  responsiveness, optimal latency are considered for choosing a node in the validator set? 

For example, a node with 1 billion XRD stake is not adding value in making the network more efficient (factors mentioned above are not met) and a node with 10 million XRD meets the required criteria in making network efficient , is it safe to assume the later will be considered for validator node set.

There are many metrics that can be used to help determine the quality of the validator set, most of them are also used for slashing too allowing penalization of underperforming validators that are in the set.

Some of them though can’t be quantified until a validator is in the set.  For example, how do you know that a validator is not going to process stuff with a lot of latency before you give them stuff to process?

If there was some way to do that, if am adversarial I can just behave myself to pass any “tests” and then go latent once I’ve made it in the set if I plan to disrupt or something.

Is there an impact on dApps created on RPN-1 when the transition to RPN-2 occurs? No refactor needed?

Having to refactor would be a terrible user experience, so we’ll make sure that backward compatibility is retaining save for the most critical cases (affecting security for example)

What’s the maximum/minimum egress/ingress bandwidth rate (traffic from the vm/instance hosting the radix node to the network outward traffic and inward traffic) that a radix node should have to consider as optimal in RPN-1 ? (looking for an approximate number as per the internal tests).

The answer to this question depends on how it’s being quantified.

First, you have the pure atom data that represents the actions, at 50+tps that’s not going to be a great deal.  Most atoms will be pretty simple, likely < 1kb – so 1/2 Mbit would be enough for that.

But then there is the gossip to transport those atoms around the network, so you need at least 1/2Mbit UPSTREAM too.

Then receiving the phase proposals from the leader along with any QCs that it has produced, so again, perhaps another 1/2 Mbit UP/DOWN on top.

Finally, some bandwidth remaining for “housekeeping” between validators and allowing them to actually service client requests etc.

Ideally probably looking at 5 Mbit minimum of up/down to be safe.

I have some questions about the 100 DPoS node operators…

Could a bad actor run enough of these nodes (while pretending to be different people) to 33% attack?

Could bad actors running these nodes collude to pull off a 33% attack?

What incentives do these node operators have to not attack?

What happens to the network if a group of bad actors tracks down the physical location of the 100 DPoS nodes and destroys them simultaneously?

What if all 100 are DDoS attacked simultaneously?

Sure, a bad actor could pretend to be 1 or many … what matters is the 33% and the cost to acquire.

Again, sure, they could collude but there better be some incentive for them to do so that is worth more than the 33% of stake they have between them. This has been covered extensively in here and other places.

Rewards mainly, but there is also the bare incentive of “I want/need this network to be alive and secure”. Think of Bitcoin full nodes, there is zero incentive to run one other than the latter, yet there are 1000s of them.

The last 2 questions are the same, if the 100 DPoS nodes go away, that’s the same as a 100% attack.

Would it still be worth pursuing a passive consensus mechanism, I know that you have experience with it. Or with the atomic composability stuff, is that becoming impossible to accomplish.

Passive doesn’t work too well with atomicity, unfortunately. It operates on the assumption that everything is good until proven bad.  

If you need atomicity, you need to know that everything is good now so that validators can commit. Passive fault detection has some latency, so generally by the time you know there is a fault somewhere else that means it shouldn’t be committed, you already have and have made a mess.

That said, some of the Cassandra stuff I’m working on allows a really high degree of asynchronicity, specifically in the state execution phase, which is one of the passive properties … so you could say I’m somewhat still looking at it, but in a different way.

Radix is intended to be permissionless. Would it violate that policy if only a “legal entity”  (that is registered in some jurisdiction in the real world) could run a validator node?  The validator selection software could use a zero-knowledge proof to verify that such a registration exists, without placing any restriction on who could run a validator or violating their privacy. By agreeing to just that, the potential penalty for validator bad behavior could be far stronger than mere slashing.

This is not something that could or needs to be implemented immediately since it requires many jurisdictions to provide an API for executing the zero-knowledge proof.  But just knowing that such a concept was on the radix roadmap would deter a lot of bad validator behavior.

Honestly, I’m strongly opposed to anything like that while still branding Radix main net as permissionless.

Someone somewhere needs to generate and verify that identity for the validator, ZKP or not, someone somewhere knows that it exists.

You then need to trust the producer of the ZKP too, and you’re beginning to fall back to centralized systems to drive it.

Also, who would be responsible for pursuing the “more aggressive” penalty? Radix? Foundation? Seems another centralization point.

Considering that slashing in its own right is likely enough to discourage all but the most destructive of adversaries, all seems like a lot of complexity and compromise for not much gain.

This channel has repeatedly said that one can increase Radix’s TPS indefinitely by just adding more nodes.  I think maybe this has cause and effect backward here.  

Isn’t it that first the software decides more shard sets are needed because not enough nodes can keep up with the workload. Then the next epoch there will be more shard sets, so that those nodes that cannot otherwise keep up, can lower their processing load by processing fewer shard sets. Each epoch there must be 100 validator nodes per shard set. That is, the maximum number of validator nodes needed must be “100*number of shard sets”.  (There could be fewer validator nodes than this amount since a validator node can process several or even all of the shard sets. ) Not less than 100 because it compromises security of a PoS network. Not more, since too many validators per shard set increases the message overhead, (which increases o(n^2)?). The number 100 is apparently widely used in PoS literature, but I would appreciate a reference to where it is shown to be a good compromise.

In particular, if the network grows to require 10 shard sets, then no more than 1000 validator nodes will be needed. There may be other non-validator nodes (on standby to become a validator next epoch). But only validator nodes will participate in consensus and share emission rewards. There will be transients until enough new validator nodes can be spun up to satisfy the demand for new processing power.  The transients should not be a problem, since the number of actual validator nodes should slowly grow from its minimum to its maximum (because many validators will have the capacity to service more than just one shard set).

There are 2 main considerations when reorging the validators into more shard groups.

The first is can the validators keep up with the load? This one’s really tricky because it’s completely abstract in nature and you can’t ever be sure that the validators are overloaded, or simply being purposely down-throttled for some end game.

Hence validator load will likely never play any part in triggering any kind of reorg of validator sets, and instead will rely on the available incentives/penalties for validator operators to provide enough compute to keep up.

The second is to do with bound and theory, which is absolute. Given a voting model, it is possible to calculate the point at which the overhead -> security pushes into diminishing returns.  For example, running a validator set of 200 = 4x more work for 20% more security vs running 2 validator sets of 100 = 2x more work for 2x more security … you go for the latter.

Those can be baked in and define the optimal validator set size to provide optimal security and throughput. FYI the 100 would be a target, it doesn’t strictly have to be 100, some could have 103, some could have 95. Point is you can quantify in absolutes the security requirements and “early out” when met if needed.

Oh, the 100 is really due to the following, and it’s exaggerated in DPoS so I’ll use that.

Imagine we have 10 validators and 1M XRD spread equally at each via DPoS, so 100k stake each. 3 of the validators are maybe in a similar region and the internet goes off, so 30% of vote power availability is gone. Then consider one of the remaining validators is running Windows 10 and the infamous “I’m gunna update at 3am and not warn you” update happens … and that validator doesn’t come back on. 40% is gone … can’t meet the supermajority of 67%+ you have a liveness break.

Now we have 100 validators with 10k each, 30 of them are in the same region and the internet goes off 70% vote power available … Windows 10 update happens on that same validator 69% available… no liveness break.

Same stake, same “security” level, but a lot less risk with more validators as the probability of multiple windows 10 updates at the same moment which fail is less. 

That covers all the questions Dan had time to answer this session. If you are keen to get more insights from Dan’s AMA’s, the full history of questions & answers can be easily found by searching the #AMA tag in the Radix Telegram Channel. If you would like to submit a question for the next sessions, just post a message with questions in the main telegram, using the #AMA tag!