2024-10-16 21:15:00
vitalik.eth.limo
2024 Oct 17
See all posts
Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc and Georgios
Konstantopoulos
At the beginning, Ethereum had two scaling strategies in its roadmap.
One (eg. see this
early paper from 2015) was “sharding“: instead of
verifying and storing all of the transactions in the chain, each node
would only need to verify and store a small fraction of the
transactions. This is how any other peer-to-peer network (eg.
BitTorrent) works too, so surely we could make blockchains work the same
way. Another was layer 2 protocols: networks that would
sit on top of Ethereum in a way that allow them to fully benefit from
its security, while keeping most data and computation off the main
chain. “Layer 2 protocols” meant state
channels in 2015, Plasma in 2017, and
then rollups
in 2019. Rollups are more powerful than state channels or Plasma, but
they require a large amount of on-chain data bandwidth. Fortunately, by
2019 sharding research had solved the
problem of verifying “data availability” at scale. As a result, the
two paths converged, and we got the rollup-centric
roadmap which continues to be Ethereum’s scaling strategy today.
The Surge, 2023 roadmap edition.
The rollup-centric roadmap proposes a simple division of labor: the
Ethereum L1 focuses on being a robust and decentralized base layer,
while L2s take on the task of helping the ecosystem scale. This is a
pattern that recurs everywhere in society: the court system (L1) is not
there to be ultra-fast and efficient, it’s there to protect contracts
and property rights, and it’s up to entrepreneurs (L2) to build on top
of that sturdy
base layer
and take humanity to (metaphorical and literal) Mars.
This year, the rollup-centric roadmap has seen important successes:
Ethereum L1 data bandwidth has increased greatly with EIP-4844 blobs, and multiple EVM
rollups are now at stage
1. A very heterogeneous
and pluralistic implementation of sharding, where each L2 acts as a
“shard” with its own internal rules and logic, is now reality. But as we
have seen, taking this path has some unique challenges of its own. And
so now our task is to bring the rollup-centric roadmap to
completion, and solve these problems, while preserving the robustness
and decentralization that makes the Ethereum L1 special.
The Surge: key goals
- 100,000+ TPS on L1+L2
- Preserve decentralization and robustness of L1
- At least some L2s fully inherit Ethereum’s core properties
(trustless, open, censorship resistant) - Maximum interoperability between L2s. Ethereum should feel like one
ecosystem, not 34 different blockchains.
In this chapter
Aside: the scalability
trilemma
The scalability trilemma was an idea introduced
in 2017, which argued that there is a tension between three
properties of a blockchain: decentralization (more
specifically: low cost to run a node), scalability
(more specifically: high number of transactions processed), and
security (more specifically: an attacker needing to
corrupt a large portion of the nodes in the whole network to make even a
single transaction fail).
Notably, the trilemma is not a theorem, and the post
introducing the trilemma did not come with a mathematical
proof. It did give a heuristic mathematical argument: if a
decentralization-friendly node (eg. consumer laptop) can verify N
transactions per second, and you have a chain that processes k*N
transactions per second, then either (i) each transaction is only seen
by 1/k of nodes, which implies an attacker only needs to corrupt a few
nodes to push a bad transaction through, or (ii) your nodes are going to
be beefy and your chain not decentralized. The purpose of the
post was never to show that breaking the trilemma is impossible; rather,
it was to show that breaking the trilemma is hard – it requires
somehow thinking outside of the box that the argument implies.
For many years, it has been common for some high-performance chains
to claim that they solve the trilemma without doing anything clever at a
fundamental architecture level, typically by using software engineering
tricks to optimize the node. This is always misleading, and running a
node in such chains always ends up far more difficult than in Ethereum.
This
post gets into some of the many subtleties why this is the case (and
hence, why L1 client software engineering alone cannot scale Ethereum
itself).
However, the combination of data availability sampling and
SNARKs does solve the trilemma: it allows a client to verify
that some quantity of data is available, and some number of steps of
computation were carried out correctly, while downloading only a small
portion of that data and running a much smaller amount of computation.
SNARKs are trustless. Data availability sampling has a nuanced few-of-N
trust model, but it preserves the fundamental property that
non-scalable chains have, which is that even a 51% attack cannot
force bad blocks to get accepted by the network.
Another way to solve the trilemma is Plasma architectures, which use
clever techniques to push the responsibility to watch for data
availability to the user in an incentive-compatible way. Back in
2017-2019, when all we had to scale computation was fraud proofs, Plasma
was very limited in what it could safely do, but the mainstreaming of
SNARKs makes Plasma architectures far
more viable for a wider array of use cases than before.
Further progress
in data availability sampling
What problem are we solving?
As of 2024 March 13, when the Dencun upgrade
went live, the Ethereum blockchain has three ~125 kB “blobs” per
12-second slot, or ~375 kB per slot of data
availability bandwidth. Assuming transaction data is published onchain
directly, an ERC20 transfer is ~180 bytes, and so the maximum TPS of
rollups on Ethereum is:
375000 / 12 / 180 = 173.6 TPS
If we add Ethereum’s calldata (theoretical max: 30 million gas per
slot / 16 gas per byte = 1,875,000 bytes per slot), this becomes
607 TPS. With PeerDAS, the plan is to increase the blob
count target to 8-16, which would give us 463-926 TPS
in calldata.
This is a major increase over the Ethereum L1, but it is not enough.
We want much more scalability. Our medium-term target is 16 MB
per slot, which if combined with improvements in rollup data
compression would give us ~58,000 TPS.
What is it and how does it
work?
PeerDAS is a relatively simple implementation of “1D sampling”. Each
blob in Ethereum is a degree-4096 polynomial over a 253-bit prime field.
We broadcast “shares” of the polynomial, where each share consists of 16
evaluations at an adjacent 16 coordinates taken from a total set of 8192
coordinates. Any 4096 of the 8192 evaluations (with current proposed
parameters: any 64 of the 128 possible samples) can recover the
blob.
PeerDAS works by having each client listen on a small number of
subnets, where the i’th subnet broadcasts the i’th sample of any blob,
and additionally asks for blobs on other subnets that it needs by asking
its peers in the global p2p network (who would be listening to different
subnets). A more conservative version, SubnetDAS,
uses only the subnet mechanism, without the additional layer of
asking peers. A current proposal is for nodes participating in proof of
stake to use SubnetDAS, and for other nodes (ie. “clients”) to use
PeerDAS.
Theoretically, we can scale 1D sampling pretty far: if we increase
the blob count maximum to 256 (so, the target to 128), then we would get
to our 16 MB target while data availability sampling would only cost
each node 16 samples * 128 blobs * 512 bytes per sample per blob = 1 MB
of data bandwidth per slot. This is just barely within our reach of
tolerance: it’s doable, but it would mean bandwidth-constrained clients
cannot sample. We could optimize this somewhat by decreasing blob count
and increasing blob size, but this would make reconstruction more
expensive.
And so ultimately we want to go further, and do 2D
sampling, which works by random sampling not just
within blobs, but also between blobs. The linear
properties of KZG commitments are used to “extend” the set of blobs in a
block with a list of new “virtual blobs” that redundantly encode the
same information.
2D sampling. Source:
a16z crypto
Crucially, computing the extension of the commitments does not
require having the blobs, so the scheme is fundamentally friendly to
distributed block construction. The node actually constructing the block
would only need to have the blob KZG commitments, and can themslves rely
on DAS to verify the availability of the blobs. 1D DAS is also
inherently friendly to distributed block construction.
What are some links to
existing research?
What is left to
do, and what are the tradeoffs?
The immediate next step is to finish the implementation and rollout
of PeerDAS. From there, it’s a progressive grind to keep increasing the
blob count on PeerDAS while carefully watching the network and improving
the software to ensure safety. At the same time, we want more academic
work on formalizing PeerDAS and other versions of DAS and its
interactions with issues such as fork choice rule safety.
Further into the future, we need much more work figuring out the
ideal version of 2D DAS and proving its safety properties. We also want
to eventually migrate away from KZG to a quantum-resistant,
trusted-setup-free alternative. Currently, we do not know of candidates
that are friendly to distributed block building. Even the expensive
“brute force” technique of using recursive STARKs to generate proofs of
validity for reconstructing rows and columns does not suffice, because
while technically a STARK is O(log(n) * log(log(n)) hashes in
size (with STIR), in
practice a STARK is almost as big as a whole blob.
The realistic paths I see for the long term are:
- Implement ideal 2D DAS
- Stick with 1D DAS, sacrificing sampling bandwidth
efficiency and accepting a lower data cap for the sake of simplicity and
robustness - (Hard pivot) abandon DA, and fully embrace Plasma
as a primary layer 2 architecture we are focusing on
We can view these along a tradeoff spectrum:
Note that this choice exists even if we decide to scale
execution on L1 directly. This is because if L1 is to process
lots of TPS, L1 blocks will become very big, and clients will want an
efficient way to verify that they are correct, so we would have to use
the same technology that powers rollups (ZK-EVM and DAS) at L1.
How does
it interact with other parts of the roadmap?
The need for 2D DAS is somewhat lessened, or at least delayed, if
data compression (see below) is implemented, and it’s lessened even
further if Plasma is widely used. DAS also poses a challenge to
distributed block building protocols and mechanisms: while DAS is
theoretically friendly to distributed reconstruction, this needs to be
combined in practice with inclusion
list proposals and their surrounding fork choice mechanics.
Data compression
What problem are we solving?
Each transaction in a rollup takes a significant amount of data space
onchain: an ERC20 transfer takes about 180 bytes. Even with ideal data
availability sampling, this puts a cap on scalability of layer 2
protocols. With 16 MB per slot, we get:
16000000 / 12 / 180 = 7407 TPS
What if in addition to tackling the numerator, we can also tackle the
denominator, and make each transaction in a rollup take fewer bytes
onchain?
What is it and how does it
work?
The best explanation in my opinion is this
diagram from two years ago:
The simplest gains are just zero-byte compression: replacing each
long sequence of zero bytes with two bytes representing how many zero
bytes there are. To go further, we take advantage of the specific
properties of transactions:
- Signature aggregation – we switch from ECDSA
signatures to BLS signatures, which have the property that many
signatures can be combined together into a single signature that attests
for the validity of all of the original signatures. This is not
considered for L1 because the computational costs of verification, even
with aggregation, are higher, but in a data-scarce environment like L2s,
they arguably make sense. The aggregation feature of ERC-4337 presents one
path for implementing this. - Replacing addresses with pointers – if an address
was used before, we can replace the 20-byte address with a 4-byte
pointer to a location in history. This is needed to achieve the biggest
gains, though it takes effort to implement, because it requires (at
least a portion of) the blockchain’s history to effectively become part
of the state. - Custom serialization for transaction values – most
transaction values have very few digits, eg. 0.25 ETH is represented as
250,000,000,000,000,000 wei. Gas max-basefees and priority fees work
similarly. We can thus represent most currency values very compactly
with a custom decimal floating point format, or even a dictionary of
especially common values.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
The main thing left to do is to actually implement the above schemes.
The main tradeoffs are:
- Switching to BLS signatures takes significant effort, and reduces
compatibility with trusted hardware chips that can increase security. A
ZK-SNARK wrapper around other signature schemes could be used to replace
this. - Dynamic compression (eg. replacing addresses with pointers)
complicates client code. - Posting state diffs to chain instead of transactions reduces
auditability, and makes a lot of software (eg. block explorers) not
work.
How does
it interact with other parts of the roadmap?
Adoption of ERC-4337, and eventually the enshrinement of parts of it
in L2 EVMs, can greatly hasten the deployment of aggregation techniques.
Enshrinement of parts of ERC-4337 on L1 can hasten its deployment on
L2s.
Generalized Plasma
What problem are we solving?
Even with 16 MB blobs and data compression, 58,000 TPS is not
necessarily enough to fully take over consumer payments, decentralized
social or other high-bandwidth sectors, and this becomes especially true
if we start taking privacy into account, which could drop
scalability by 3-8x. For high-volume, low-value applications, one option
today is a validium,
which keeps data off-chain and has an interesting security model where
the operator cannot steal users’ funds, but they can disappear
and temporarily or permanently freeze all users’ funds. But we
can do better.
What is it and how does it
work?
Plasma is a scaling solution that involves an operator publishing
blocks offchain, and putting the Merkle roots of those blocks onchain
(as opposed to rollups, where the full block is put onchain). For each
block, the operator sends to each user a Merkle branch proving what
happened, or did not happen, to that user’s assets. Users can withdraw
their assets by providing a Merkle branch. Importantly, this branch does
not have to be rooted in the latest state – for this reason,
even if data availability fails, the user can still recover their assets
by withdrawing the latest state they have that is available. If a user
submits an invalid branch (eg. exiting an asset that they already sent
to someone else, or the operator themselves creating an asset out of
thin air), an onchain challenge mechanism can adjudicate who the asset
rightfully belongs to.
A diagram of a Plasma Cash chain. Transactions spending
coin i
are put into the i
‘th
position in the tree. In this example, assuming all previous trees are
valid, we know that Eve currently owns coin 1, David owns coin 4 and
George owns coin 6.
Early versions of Plasma were only able to handle the payments use
case, and were not able to effectively generalize further. If we require
each root to be verified with a SNARK, however, Plasma becomes much more
powerful. Each challenge game can be simplified significantly, because
we take away most possible paths for the operator to cheat. New paths
also open up to allow Plasma techniques to be extended to a much more
general class of assets. Finally, in the case where the operator does
not cheat, users can withdraw their funds instantly, without needing to
wait for a one-week challenge period.
One way (not the only way) to make an EVM plasma chain:
use a ZK-SNARK to construct a parallel UTXO tree that reflects the
balance changes made by the EVM, and defines a unique mapping of what is
“the same coin” at different points in history. A Plasma construction
can then be built on top of that.
One key insight is that the Plasma system does not need to be
perfect. Even if you can only protect a subset of assets (eg. even just
coins that have not moved in the past week), you’ve already greatly
improved on the status quo of ultra-scalable EVM, which is a
validium.
Another class of constructions is hybrid plasma/rollups,
such as Intmax. These constructions
put a very small amount of data per user onchain (eg. 5 bytes), and by
doing so, get properties that are somewhere between plasma and rollups:
in the Intmax case, you get a very high level of scalability and
privacy, though even in the 16 MB world capacity is theoretically capped
to roughly 16,000,000 / 12 / 5 = 266,667 TPS.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
The main remaining task is to bring Plasma systems to production. As
mentioned above, “plasma vs validium” is not a binary: any
validium can have its safety properties improved at least a little bit
by adding Plasma features into the exit mechanism. The research part is
in getting optimal properties (in terms of trust requirements, and
worst-case L1 gas cost, and vulnerability to DoS) for an EVM, as well as
alternative application specific constructions. Additionally, the
greater conceptual complexity of Plasma relative to rollups needs to be
addressed directly, both through research and through construction of
better generalized frameworks.
The main tradeoff in using Plasma designs is that they depend more on
operators and are harder to make “based“,
though hybrid plasma/rollup designs can often avoid this weakness.
How does
it interact with other parts of the roadmap?
The more effective Plasma solutions can be, the less pressure there
is for the L1 to have a high-performance data availability
functionality. Moving activity to L2 also reduces MEV pressure on
L1.
Maturing L2 proof systems
What problem are we solving?
Today, most rollups are not yet actually trustless; there is a
security council that has the ability to override the behavior of the
(optimistic or validity) proof
system. In some cases, the proof system is not even live at all, or
if it is it only has an “advisory” functionality. The furthest ahead are
(i) a few application-specific rollups, such as Fuel, which are
trustless, and (ii) as of the time of this writing, Optimism and
Arbitrum, two full-EVM rollups that have achieved a
partial-trustlessness milestone known as “stage 1”. The reason why
rollups have not gone further is concern about bugs in the code. We need
trustless rollups, and so we need to tackle this problem head on.
What is it and how does it
work?
First, let us recap the “stage” system, originally introduced in this
post. There are more detailed requirements, but the summary is:
- Stage 0: it must be possible for a user to run a
node and sync the chain. It’s ok if validation is fully
trusted/centralized. - Stage 1: there must be a (trustless) proof
system that ensures that only valid transactions get accepted.
It’s allowed for there to be a security council that can
override the proof system, but only with a 75%
threshold vote. Additionally, a quorum-blocking portion of the
council (so, 26%+) must be outside the main company building the rollup.
An upgrade mechanism with weaker features (eg. a DAO) is allowed, but it
must have a delay long enough that if it approves a malicious upgrade,
users can exit their funds before it comes online. - Stage 2: there must be a (trustless) proof
system that ensures that only valid transactions get accepted.
Security councils are only allowed to intervene in the event of
provable bugs in the code, eg. if two redundant proof systems
disagree with each other or if one proof system accepts two different
post-state roots for the same block (or accepts nothing for a
sufficiently long period of time eg. a week). An upgrade mechanism is
allowed, but it must have a very long delay.
The goal is to reach Stage 2. The main challenge in reaching
stage 2 is getting enough confidence that the proof system actually is
trustworthy enough. There are two major ways to do this:
- Formal verification: we can use modern mathematical
and computational techniques to prove that an (optimistic or validity)
proof system only accept blocks that pass the EVM specification. These
techniques have existed for decades, but recent advancements such as Lean 4 have
made them much more practical, and advancements in AI-assisted proving
could potentially accelerate this trend further. - Multi-provers: make multiple proof systems, and put
funds into a 2-of-3 (or larger) multisig between those proof systems and
a security council (and/or other gadget with trust assumptions, eg.
TEEs). If the proof systems agree, the security council has no power; if
they disagree, the security council can only choose between one of them,
it can’t unilaterally impose its own answer.
Stylized diagram of a multi-prover, combining one
optimistic proof system, one validity proof system and a security
council.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
For formal verification, a lot. We need to create a formally
verified version of an entire SNARK prover of an EVM. This is an
incredibly complex project, though it is one that we have already started. There is
one trick that significantly simplifies the task: we can make a formally
verified SNARK prover of a minimal VM, eg. RISC-V or Cairo, and then write
an implementation of the EVM in that minimal VM (and formally prove its
equivalence to some other EVM specification).
For multi-provers, there are two main remaining pieces. First, we
need to get enough confidence in at least two different proof systems,
both that they are reasonably safe individually and that if they break,
they would break for different and unrelated reasons (and so they would
not break at the same time). Second, we need to get a very high level of
assurance in the underlying logic that merges the proof systems. This is
a much smaller piece of code. There are ways to make it
extremely small – just store funds in a Safe multisig contract whose
signers are contracts representing individual proof systems – but this
has the tradeoff of high onchain gas costs. Some balance between
efficiency and safety will need to be found.
How does
it interact with other parts of the roadmap?
Moving activity to L2 reduces MEV pressure on L1.
Cross-L2
interoperability improvements
What problem are we solving?
One major challenge with the L2 ecosystem today is that it is
difficult for users to navigate. Furthermore, the easiest ways of doing
so often re-introduce trust assumptions: centralized bridges, RPC
clients, and so forth. If we are serious about the idea that L2s are
part of Ethereum, we need to make using the L2 ecosystem feel like using
a unified Ethereum ecosystem.
An example of pathologically bad (and even dangerous: I
personally lost $100 to a chain-selection mistake here) cross-L2 UX –
though this is not Polymarket’s fault, cross-L2 interoperability should
be the responsibility of wallets and the Ethereum standards (ERC)
community. In a well-functioning Ethereum ecosystem, sending coins from
L1 to L2, or from one L2 to another, should feel just like sending coins
within the same L1.
What is it and how does it
work?
There are many categories of cross-L2 interoperability improvements.
In general, the way to come up with these is to notice that in theory,
a
rollup-centric Ethereum is the same thing as L1 execution sharding,
and then ask where the current Ethereum L2-verse falls short of that
ideal in practice. Here are a few:
- Chain-specific addresses: the chain (L1, Optimism,
Arbitrum…) should be part of the address. Once this is implemented,
cross-L2 sending flows can be implemented by just putting the address
into the “send” field, at which point the wallet can figure out how to
do the send (including using bridging protocols) in the background. - Chain-specific payment requests: it should be easy
and standardized to make a message of the form “send me X tokens of type
Y on chain Z”. This has two primary use cases: (i) payments, whether
person-to-person or person-to-merchant-service, and (ii) dapps
requesting funds, eg. the Polymarket example above. - Cross-chain swaps and gas payment: there should be
a standardized open protocol for expressing cross-chain operations such
as “I am sending 1 ETH on Optimism to whoever sends me 0.9999 ETH on
Arbitrum”, and “I am sending 0.0001 ETH on Optimism to whoever includes
this transaction on Arbitrum”. ERC-7683 is one
attempt at the former, and RIP-7755
is one attempt at the latter, though both are also more general than
just these specific use cases. - Light clients: users should be able to actually
verify the chains that they are interacting with, and not just trust RPC
providers. A16z crypto’s Helios does this for Ethereum
itself, but we need to extend this trustlessness to L2s. ERC-3668 (CCIP-read)
is one strategy for doing this.
How a light client can update its view of the Ethereum
header chain. Once you have the header chain, you can use Merkle proofs
to validate any state object. And once you have the right L1 state
objects, you can use Merkle proofs (and possibly signatures, if you want
to check preconfirmations) to validate any state object on L2. Helios
does the former already. Extending to the latter is a standardization
challenge.
- Keystore wallets: today, if you want to update the
keys that control your smart contract wallet, you have to do it on all N
chains on which that wallet exists. Keystore wallets are a technique
that allow the keys to exist in one place (either on L1, or later
potentially on an L2), and then be read from any L2 that has a copy of
the wallet. This means that updates only need to happen once. To be
efficient, keystore wallets require L2s to have a standardized way to
costlessly read L1; two proposals for this are L1SLOAD
and REMOTESTATICCALL.
A stylized diagram of how keystore wallets work.
-
More radical “shared token bridge” ideas:
imagine a world where all L2s are validity proof rollups, that commit to
Ethereum every slot. Even in this world, moving assets from one L2 to
another L2 “natively” would require withdrawaing and depositing, which
requires paying a substantial amount of L1 gas. One way to solve this is
to create a shared minimal rollup, whose only function would be to
maintain the balances of how many tokens of which type are owned by
which L2, and allow those balances to be updated en masse by a series of
cross-L2 send operations initiated by any of the L2s. This would allow
cross-L2 transfers to happen without needing to pay L1 gas per transfer,
and without needing liquidity-provider-based techniques like
ERC-7683. -
Synchronous composability: allow synchronous
calls to happen either between a specific L2 and L1, or between multiple
L2s. This could be helpful in improving financial efficiency of defi
protocols. The former could be done without any cross-L2 coordination;
the latter would require shared
sequencing. Based
rollups are automatically friendly to all of these
techniques.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
Many of the examples above face standard dilemmas of when to
standardize and what layers to standardize. If you standardize too
early, you risk entrenching an inferior solution. If you standardize too
late, you risk creating needless fragmentation. In some cases, there is
both a short-term solution that has weaker properties but is easier to
implement, and a long-term solution that is “ultimately right” but will
take quite a few years to get there.
One way in which this section is unique, is that these tasks are not
just technical problems: they are also (perhaps even primarily!) social
problems. They require L2s and wallets and L1 to cooperate. Our ability
to handle this problem successfully is a test of our ability to stick
together as a community.
How does
it interact with other parts of the roadmap?
Most of these proposals are “higher-layer” constructions, and so do
not greatly affect L1 considerations. One exception is shared
sequencing, which has heavy impacts on MEV.
Scaling execution on L1
What problem are we solving?
If L2s become very scalable and successful but L1 remains capable of
processing only a very low volume of transactions, there are many risks
to Ethereum that might arise:
- The economic situation of ETH the asset becomes more risky, which in
turn affects long-run security of the network. - Many L2s benefit from being closely tied to a highly developed
financial ecosystem on L1, and if this ecosystem greatly weakens, the
incentive to become an L2 (instead of being an independent L1)
weakens - It will take a long time before L2s have exactly the same security
assurances as L1. - If an L2 fails (eg. due to a malicious or disappearing operator),
users would still need to go through L1 in order to recover their
assets. Hence, L1 needs to be powerful enough to be able to at least
occasionally actually handle a highly complex and chaotic wind-down of
an L2.
For these reasons, it is valuable to continue scaling L1 itself, and
making sure that it can continue to accommodate a growing number of
uses.
What is it and how does it
work?
The easiest way to scale is to simply increase the gas
limit. However, this risks centralizing the L1, and thus
weakening the other important property that makes the Ethereum L1 so
powerful: its credibility as a robust base layer. There is an ongoing
debate about what degree of simple gas limit increase is sustainable,
and this also changes based on which other technologies get implemented
to make larger blocks easier to verify (eg. history expiry,
statelessness, L1 EVM validity proofs). Another important thing to keep
improving is simply the efficiency of Ethereum client software, which is
far more optimized today than it was five years ago. An
effective L1 gas limit increase strategy would involve accelerating
these verification technologies.
Another scaling strategy involves identifying specific features and
types of computation that can be made cheaper without harming the
decentralization of the network or its security properties. Examples of
this include:
- EOF – a new EVM bytecode
format that is more friendly to static analysis, allowing for faster
implementations. EOF bytecode could be given lower gas costs to take
these efficiencies into account. - Multidimensional
gas pricing – establishing separate basefees and limits for
computation, data and storage can increase the Ethereum L1’s
average capacity without increasing its maximum
capacity (and hence creating new security risks). - Reduce gas costs of specific opcodes and
precompiles – historically, we have had several rounds of increasing gas costs for certain
operations that were underpriced in order to avoid denial of
service attacks. What we have had less of, and could do much more, is
reducing gas costs for operations that are overpriced.
For example, addition is much cheaper than multiplication, but the costs
of theADD
andMUL
opcodes are currently the
same. We could makeADD
cheaper, and even simpler opcodes
such asPUSH
even cheaper. EOF as a whole is more - EVM-MAX
and SIMD:
EVM-MAX (“modular arithmetic extensions”) is a proposal to allow more
efficient native big-number modular math as a separate module of the
EVM. Values computed by EVM-MAX computations would only be accessible by
other EVM-MAX opcodes, unless deliberately exported; this allows greater
room to store these values in optimized
formats. SIMD (“single instruction multiple data”) is a proposal to
allow efficiently executing the same instruction on an array of values.
The two together can create a powerful coprocessor
alongside the EVM that could be used to much more efficiently implement
cryptographic operations. This would be especially useful for privacy
protocols, and for L2 proof systems, so it would help both L1 and L2
scaling.
These improvements will be discussed in more detail in a future post
on the Splurge.
Finally, a third strategy is native rollups (or
“enshrined rollups”): essentially, creating many copies of the EVM that
run in parallel, leading to a model that is equivalent to what rollups
can provide, but much more natively integrated into the protocol.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
There are three strategies for L1 scaling, which can be pursued
individually or in parallel:
- Improve technology (eg. client code, stateless clients, history
expiry) to make the L1 easier to verify, and then
raise the gas limit - Make specific operations cheaper, increasing
average capacity without increasing worst-case risks - Native rollups (ie. “create N parallel copies of
the EVM”, though potentially giving developers a lot of flexibility in
the parameters of the copies they deploy)
It’s worth understanding that these are different techniques that
have different tradeoffs. For example, native rollups have many of the
same weaknesses in composability as regular rollups: you cannot send a
single transaction that synchronously performs operations across many of
them, like you can with contracts on the same L1 (or L2). Raising the
gas limit takes away from other benefits that can be achieved by making
the L1 easier to verify, such as increasing the portion of users that
run verifying nodes, and increasing solo stakers. Making specific
operations in the EVM cheaper, depending on how it’s done, can increase
total EVM complexity.
A big question that any L1 scaling roadmap needs to answer is:
what is the ultimate vision for what belongs on L1 and what
belongs on L2? Clearly, it’s absurd for everything to
go on L1: the potential use cases go into the hundreds of thousands of
transactions per second, and that would make the L1 completely unviable
to verify (unless we go the native rollup route). But we do need
some guiding principle, so that we can make sure that we are
not creating a situation where we increase the gas limit 10x, heavily
damage the Ethereum L1’s decentralization, and find that we’ve only
gotten to a world where instead of 99% of activity being on L2, 90% of
activity is on L2, and so the result otherwise looks almost the same,
except for an irreversible loss of much of what makes Ethereum L1
special.
One proposed view of a “division of labor” between L1 and
L2s, source.
How does
it interact with other parts of the roadmap?
Bringing more users onto L1 implies improving not just scale, but
also other aspects of L1. It means that more MEV will remain on L1 (as
opposed to becoming a problem just for L2s), and so will be even more of
a pressing need to handle it explicitly. It greatly increases the value
of having fast slot times on L1. And it’s also heavily dependent on
verification of L1 (“the Verge”) going well.
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.