EVM Legacy Assembly Translator


There are two Solidity IRs used in our pipeline: Yul and EVM legacy assembly. The former is used for older versions of Solidity, more precisely <=0.7.

EVM legacy assembly is very challenging to translate to LLVM IR, since it obfuscates the control flow of the program and uses a lot of dynamic jumps. Most of the jumps can be translated to static ones by using a static analysis of EVM assembly, but some of jumps are impossible to resolve statically. For example, internal function pointers can be written to memory or storage, and then loaded and called. Recursion is another case we have skipped for now, as there is another stack frame allocated on every iteration, preventing the static analyzer from resolving the jumps.

Both issues are being worked on in our fork of the Solidity compiler, where we are changing the codegen to remove the dynamic jumps and add the necessary metadata.

Below you can see a minimal example of a Solidity contract and its EVM legacy assembly translated to LLVM IR which is eventually compiled to EraVM assembly.

Source Code

contract Example {
  function main() public pure returns (uint256 result) {
    result = 42;
  }
}

EVM Legacy Assembly

Produced by the upstream Solidity compiler v0.7.6.

| Line | Instruction  | Value/Tag |
| ---- | ------------ | --------- |
| 000  | PUSH         | 80        |
| 001  | PUSH         | 40        |
| 002  | MSTORE       |           |
| 003  | CALLVALUE    |           |
| 004  | DUP1         |           |
| 005  | ISZERO       |           |
| 006  | PUSH         | [tag] 1   |
| 007  | JUMPI        |           |
| 008  | PUSH         | 0         |
| 009  | DUP1         |           |
| 010  | REVERT       |           |
| 011  | Tag 1        |           |
| 012  | JUMPDEST     |           |
| 013  | POP          |           |
| 014  | PUSH         | 4         |
| 015  | CALLDATASIZE |           |
| 016  | LT           |           |
| 017  | PUSH         | [tag] 2   |
| 018  | JUMPI        |           |
| 019  | PUSH         | 0         |
| 020  | CALLDATALOAD |           |
| 021  | PUSH         | E0        |
| 022  | SHR          |           |
| 023  | DUP1         |           |
| 024  | PUSH         | 5A8AC02D  |
| 025  | EQ           |           |
| 026  | PUSH         | [tag] 3   |
| 027  | JUMPI        |           |
| 028  | Tag 2        |           |
| 029  | JUMPDEST     |           |
| 030  | PUSH         | 0         |
| 031  | DUP1         |           |
| 032  | REVERT       |           |
| 033  | Tag 3        |           |
| 034  | JUMPDEST     |           |
| 035  | PUSH         | [tag] 4   |
| 036  | PUSH         | [tag] 5   |
| 037  | JUMP         | [in]      |
| 038  | Tag 4        |           |
| 039  | JUMPDEST     |           |
| 040  | PUSH         | 40        |
| 041  | DUP1         |           |
| 042  | MLOAD        |           |
| 043  | SWAP2        |           |
| 044  | DUP3         |           |
| 045  | MSTORE       |           |
| 046  | MLOAD        |           |
| 047  | SWAP1        |           |
| 048  | DUP2         |           |
| 049  | SWAP1        |           |
| 050  | SUB          |           |
| 051  | PUSH         | 20        |
| 052  | ADD          |           |
| 053  | SWAP1        |           |
| 054  | RETURN       |           |
| 055  | Tag 5        |           |
| 056  | JUMPDEST     |           |
| 057  | PUSH         | 2A        |
| 058  | SWAP1        |           |
| 059  | JUMP         | [out]     |

EthIR

EthIR (Ethereal IR) is a special IR used by our translator to represent EVM legacy assembly and prepare it for the translation to LLVM IR. The IR solves several purposes:

  1. Tracking the stack state to extract jump destinations.

  2. Duplicating blocks that are reachable with different stack states.

  3. Restoring the complete control-flow graph of the contract using the abovementioned data.

  4. Resolving dependencies and static data chunks.

Data format:

  1. V_<name> - value returned by an instruction <name>.

  2. T_<tag> - tag of a block <tag>.

  3. 40 - hexadecimal constant.

  4. tests/solidity/simple/default.sol:Test - contract definition.

Stack format: [ V_CALLVALUE ] (current values) - [ V_CALLVALUE ] (popped values) + [ V_ISZERO ] (pushed values)

Unoptimized LLVM IR

In LLVM IR, the necessary stack space is allocated at the beginning of the function.

Every stack operation interacts with a statically known stack pointer with an offset from EthIR.

Optimized LLVM IR

The redundancy is optimized by LLVM, resulting in the optimized LLVM IR below.

EraVM Assembly

The optimized LLVM IR is translated into EraVM assembly below, allowing the size comparable to the Yul pipeline.

For comparison, the Yul pipeline of solc v0.8.21 produces the following EraVM assembly:

Last updated