What is Sharding?
Put simply, sharding is breaking a single database into smaller, more manageable chunks, and distributing those chunks across multiple servers, in order to spread the load and maintain a high throughput.
There are two main ways of cutting up a database – vertical partitioning and horizontal partitioning.
Vertical partitioning happens when different tables from the same database are stored in different instances. Each table is a distinct shard. A simple example of this would be putting all the transactions from North America in one table, and all the transactions from Europe in another table.
Horizontal Partitioning splits the database table into separate sets of rows, and these rows are stored in different database instances. Each set of rows is a shard. A simple example of this would be ‘“Server 1’” taking rows 1 to 10,000 and “Server 2” taking rows 10,001 to 20,000, and so on.
Horizontal partitioning has has two advantages over vertical partitioning:
- You do not need to plan how to split your data up in advance;
- Your database isn much less ‘“lumpy’” – you can dynamically scale how much of the whole database any one server needs to service.
Why is Sharding Important?
If a server cannot handle the requested number of operations, you can split that data set into parts that can be stored on different hosts.
For example, if we split a single table into two partitions, then each host will need to handle half of the operations of the original host. As the operation requests increase, the database can be split into more and more parts.
Sharding is a well established scaling solution for systems with large data sets and/or high throughput operations.
What is Distributed Ledger Technology (DLT)?
A Distributed Ledger is a database that is replicated, shared and synchronized across multiple sites, countries, and institutions. A DLT contains a consensus algorithm — a mechanism for agreeing between the parties on what the most up to date version of the ledger should be, and for resolving faults, errors or conflicts.
DLT networks are peer-to-peer, meaning there is neither a centralized administrator function nor a centralized data store. Public blockchains such as Bitcoin and Ethereum are, perhaps, the best-known examples of DLTs.
DLTs require independent machines, or nodes, to record, synchronize and share transactions in their respective ledgers.
Distributed ledgers can be permissioned or permissionless depending on whether or not nodes require permission to change the ledger. Ledgers are public or private depending on whether anyone (as opposed to just nodes in the network) can access the ledger.
What are DLTs used for?
DLTs can be used for creating the following:
- digital currencies;
- digitally unique objects;
- digital objects with intrinsic value (like tickets).
We can extrapolate many related use-cases from these, including cryptocurrency, notary services, prediction markets, smart contracts, audit trails, regulatory reporting, streamlining clearing and settlement, digital identity, and so forth. And that’s just the beginning.
Basically, distributed ledgers can be used in some form wherever securing digital relationships via a system of record is required, and control of a central authority is not desired. In other words, DLTs are useful where there is a need for a trustless, peer-to-peer network for exchange of incentives and value.
How does sharding a DLT introduce the potential for fraud?
To understand a distributed ledger is to grasp the notion of ‘“spend’”: that which is spent on one place in the network is gained somewhere else. Sharding a public DLT is difficult because you must be able to ensure that a spend has happened only once on the entire network. This is already tricky without sharding, and having multiple shards makes this even more difficult.
If a transaction that should happen only once is recorded across an entire network as happening more than once, then you run into a situation known as a double spend.
Consider this: by splitting up a DLT into multiple shards, it is possible for a bad actor to take over a small part of the network with fewer computing resources. For example, if you divide a DLT into two shards, with half of your nodes working on one shard and half on another, you now need only 25% of the hash power of the network to attack one of the two shards. With enough nodes reflecting this false spend, a double spend is now possible.
When a double spend is possible on a DLT, it allows a form of fraud. In late May of 2018, a double spend attack was launched against the Bitcoin Gold exchange: in this attack, a bad actor was able to acquire over 50% of the network’s total hashpower. By doing so, they were able to modify and omit transactions of their own coins from blocks. Subsequently, they could reverse transactions that were previously confirmed.
Understanding how a double spend works reminds us how a sharding a DLT introduces this potential for fraud: ledgers are a transaction record, and they exist to make sure that each spend happens only once. In a distributed ledger, the shards that form the ledger are not all in the same place, and thus are not going to be immediately consistent.
These problems illustrate ways in which sharding a distributed ledger is difficult: ledgers are a transaction record, and they exist to make sure that each spend happens only once. In a distributed ledger, the shards that form the ledger are not all in the same place. According to the CAP theorem of distributed databases, a distributed data store (or ledger, in our case) cannot simultaneously provide more than two out of the following three guarantees: Consistency, Availability or Partition tolerance.
Put more simply, you need to make sure a spend has happened only once on the entire network, and having a lot of shards makes this difficult.
In the next article, we’ll review how a 51% attack presents specific problems when sharding a blockchain.
Stay tuned for more on these specific problems, especially sharding, some attack tools and vectors, and how they are solved in Radix DLT.