Pubdata Post 4844
Motivation
EIP-4844, commonly known as Proto-Danksharding, is an upgrade to the ethereum protocol that introduces a new data availability solution embedded in Layer 1. More information about it can be found on Ethereum's website on Danksharding. With proto-danksharding we can utilize the new blob data availability for cheaper storage of pubdata when we commit batches resulting in more transactions per batch and cheaper batches/transactions. We want to ensure we have the flexibility at the contract level to process both pubdata via calldata, as well as pubdata via blobs. A quick callout here, while 4844 has introduced blobs as new DA layer, it is the first step in full Danksharding. With full Danksharding ethereum will be able to handle a total of 64 blobs per block unlike 4844 which supports just 6 per block.
💡 Given the nature of 4844 development from a solidity viewpoint, we’ve had to create a temporary contract
BlobVersionedHash.yul
which acts in place of the eventualBLOBHASH
opcode.
Technical Approach
The approach spans both L2 system contracts and L1 ZKsync contracts (namely Executor.sol
). When a batch is sealed on L2 we will chunk it into blob-sized pieces (4096 elements * 31 bytes per what is required by our circuits), take the hash of each chunk, and send them to L1 via system logs. Within Executor.sol
, when we are dealing with blob-based commitments, we verify that the blob contains the correct data with the point evaluation precompile. If the batch utilizes calldata instead, the processing should remain the same as in a pre-4844 ZKsync. Regardless of if pubdata is in calldata or blobs are used, the batch’s commitment changes as we include new data within the auxiliary output.
Given that this is the first step to a longer-term solution, and the restrictions of proto-danksharding that get lifted for full danksharding, we impose the following constraints:
we now support 6 blobs per batch
only 1 batch will be committed in a given transaction
we now send 6 system logs (one for each potential blob commitment)
Backward-compatibility
While some of the parameter formatting changes, we maintain the same function signature for commitBatches
and still allow for pubdata to be submitted via calldata:
Implementation
Bootloader Memory
With the increase in the amount of pubdata due to blobs, changes can be made to the bootloader memory to facilitate more l2 to l1 logs, compressed bytecodes, and pubdata. Copying the comment around pubdata slot calculation from our code:
The overall bootloader max memory has been increased to 63800000
.
L2 System Contracts
We introduce a new system contract PubdataChunkPublisher that takes the full pubdata, creates chunks that are each 126,976 bytes in length (this is calculated as 4096 elements per blob each of which has 31 bytes), and commits them in the form of 6 system logs. We have the following keys for system logs:
In addition to the blob commitments, the hash of the total pubdata is still sent and is used if a batch is committed with pubdata as calldata vs as blob data. As stated earlier, even when we only have enough pubdata for a single blob, 6 system logs are sent. The hash value in the rest of the logs in this case will bytes32(0)
.
One important thing is that we don’t try to reason about the data here, that is done in the L1Messenger and Compressor contracts. The main purpose of this is to commit to blobs and have those commitments travel to L1 via system logs.
L1 Executor Facet
While the function signature for commitBatches
and the structure of CommitBatchInfo
stays the same, the format of CommitBatchInfo.pubdataCommitments
changes. Before 4844, this field held a byte array of pubdata, now it can hold either the total pubdata as before or it can hold a list of concatenated info for kzg blob commitments. To differentiate between the two, a header byte is prepended to the byte array. At the moment we only support 2 values:
We reject all other values in the first byte.
Calldata Based Pubdata Processing
When using calldata, we want to operate on pubdataCommitments[1:pubdataCommitments.length - 32]
as this is the full pubdata that was committed to via system logs. The reason we don’t operate on the last 32 bytes is that we also include what the blob commitment for this data would be as a way to make our witness generation more generic. Only a single blob commitment is needed for this as the max size of calldata is the same size as a single blob. When processing the system logs in this context, we will check the hash of the supplied pubdata without the 1 byte header for pubdata source against the value in the corresponding system log with key TOTAL_L2_TO_L1_PUBDATA_KEY
. We still require logs for the 6 blob commitments, even if these logs contain values we will substitute them for bytes32(0)
when constructing the batch commitment.
Blob Based Pubdata Processing
The format for pubdataCommitments
changes when we send pubdata as blobs, containing data we need to verify the blob contents via the newly introduced point evaluation precompile. The data is pubdataCommitments[1:]
is the concatenation of opening point (16 bytes) || claimed value (32 bytes) || commitment (48 bytes) || proof (48 bytes)
for each blob attached to the transaction, lowering our calldata from N → 144 bytes per blob. More on how this is used later on.
Utilizing blobs causes us to process logs in a slightly different way. Similar to how it's done when pubdata is sent via calldata, we require a system log with a key of the TOTAL_L2_TO_L1_PUBDATA_KEY
, although the value is ignored and extract the 6 blob hashes from the BLOB_ONE_HASH_KEY
to BLOB_SIX_HASH_KEY
system logs to be used in the batch commitment.
While calldata verification is simple, comparing the hash of the supplied calldata versus the value in the system log, we need to take a few extra steps when verifying the blobs attached to the transaction contain the correct data. After processing the logs and getting the 6 blob linear hashes, we will have all the data we need to call the point evaluation precompile. Recall that the contents of pubdataCommitments
have the opening point (in its 16 byte form), claimed value, the commitment, and the proof of this claimed value. The last piece of information we need is the blob’s versioned hash (obtained via BLOBHASH
opcode).
There are checks within _verifyBlobInformation
that ensure that we have the correct blob linear hashes and that if we aren’t expecting a second or more blobs, the linear hash should be equal to bytes32(0)
. This is how we signal to our circuits that we didn’t publish any information in the second blob.
Verifying the commitment via the point evaluation precompile goes as follows (note that we assume the header byte for pubdataSource has already been removed by this point):
Where correctness is validated by checking the latter 32 bytes of output from the point evaluation call is equal to BLS_MODULUS
.
Batch Commitment and Proof of Equivalence
With the contents of the blob being verified, we need to add this information to the batch commitment so that it can further be part of the verification of the overall batch by our proof system. Our batch commitment is the hashing of a few different values: passthrough data (holding our new state root, and next enumeration index to be used), meta parameters (flag for if zk porter is available, bootloader bytecode hash, and default account bytecode hash), and auxiliary output. The auxiliary output changes with 4844, adding in 4 new fields and the new corresponding encoding:
6
bytes32
fields for linear hashes - These are the hashes of the blob’s preimages6
bytes32
for 4844 output commitment hashes - These are(versioned hash || opening point || evaluation value)
- The format of the opening point here is expected to be the 16 byte value passed by calldataWe encode an additional 28
bytes32(0)
at the end because with the inclusion of vm 1.5.0, our circuits support a total of 16 blobs that will be used once the total number of blobs supported by ethereum increase.
where blobAuxOutputWord
contains _blobCommitments
and _blobHashes
of all the blobs, and 20 * bytes32(0)
, since the circuits require 16 blobs
There are different scenarios that change the values posted here:
We submit pubdata via calldata
We utilize 1 or more than 1 blobs
When we use calldata, the values for the linear hashes and output commitments should all be bytes32(0)
. If we are using blobs but only have a single blob, _blobHashes[0]
and _blobCommitments[0]
should correspond to that blob, while the rest will be bytes32(0)
. If we use more blobs the linear hashes and output commitments should be present for the corresponding blobs.
Our circuits will then handle the proof of equivalence, following a method similar to the moderate approach, verifying that the total pubdata can be repackaged as the blobs we submitted and that the commitments in fact evaluate to the given value at the computed opening point.
Pubdata Contents and Blobs
Given how data representation changes on the consensus layer (where blobs live) versus on the execution layer (where calldata is found), there is some preprocessing that takes place to make it compatible. When calldata is used for pubdata, we keep it as is and no additional processing is required to transform it. Recalling the above section when pubdata is sent via calldata it has the format: source byte (1 bytes) || pubdata || blob commitment (32 bytes) and so we must first trim it of the source byte and blob commitment before decoding it. A more detailed guide on the format can be found in our documentation. Using blobs requires a few more steps:
Now we can apply the encoding formula, with some of the data from the blob commit transaction to move from encoded blobs back into decodable zksync pubdata:
The last thing to do depends on the strategy taken, the two approaches are:
Remove all trailing zeroes after concatenation
Parse the data and ignore the extra zeroes at the end
The second option is a bit messier so going with the first, we can then decode the pubdata and when we get to the last state diff, if the number of bytes is less than specified we know that the remaining data are zeroes. The needed functions can be found within the zkevm_circuits code.
Last updated