EVM Deep Dive Part 2

Lead

In Part 1, we explored how EVM knows which bytecode needs to be run through the called contract function, where we learned about the call stack, calldata, function signature, and EVM opcode instructions.

In Part 2, we’ll start the memory journey to get a complete picture of the contract’s memory and how it works on EVM.

In this series, we’ll introduce the articles (https://noxx.substack.com/) that translate noxx to dive into the basics of EVM.

Memory journey

We’re still using the sample code shown to you on remix in Part 1.

aTGSKTARTXQaAOZzBKJ8h3MYqnrXFQC4kYpI4SCM.png

In Part 1 we studied the parts related to feature selection based on the bytecode generated after compiling the contract. In this article, we’ll focus on the first 5 bytes of bytecode.

exGTh0gS0p2UbCUHYlpYwlt1fcQsD4rRj1ujUKrJ.png

These 5 bytes represent the initialization of the “free memory pointer” operation. To fully understand the role of these bytecodes, you first need to understand the data structures that govern the memory of the contract.

1. Memory data structure

Contract memory is a simple byte array in which a data store can store data using blocks of 32 bytes (256 bits) or 1 bytes (8 bits), but only fixed-size blocks of 32 bytes (256 bits) of data can be read at a time. The following image illustrates this structure and the read/write capabilities of the contract memory.

p6zYRR5OOHwNm6nlUROoIeLryqLU3gqHuYJIqGD4.png

This feature is determined by 3 opcodes that manipulate memory.

  • MSTORE (x, y): Stores a 32-byte (256-bit) value of “y” starting at memory location “x”.
  • MLOAD (x): Loads 32 bytes (256 bits) onto the call stack starting at memory location “x”.
  • MSTORE8 (x, y): Stores a 1-byte (8-bit) value “y” (the least significant byte of a 32-byte stack value) at the memory location “x”.

You can think of the memory location simply as an array index that starts writing/reading data. If you want to write/read more than 1 byte of data, simply continue writing or reading from the next array index.

2、EVM Playground

EVM Playground helped solidify our understanding of how these 3 opcodes work, what they do, and where they are in memory. Click Run and the arrows in the upper-right corner to debug to see how the stack and memory have changed. (There are notes above the opcode to describe what each section does)

Ou0h5xV2Ot89GdQPyuoOId6BkvrNSl0yUjIUp5SY.png

k78bIeFRt1H2yx9DSqmxkK835RLFO17p5jD0FizH.png

You may notice some strange phenomena, I only added 1 byte, why so many zeros?

3. Memory expansion

When a contract writes to memory, you need to pay Gas for the number of bytes written, which is the overhead of expanding memory. If we are writing to a region of memory that we have not written to before, the first use of it incurs additional memory expansion overhead.

When writing to previously untouched memory space, memory expands in increments of 32 bytes (256 bits). For the first 724 bytes, memory expansion grows linearly and then quadratically. (Derived from the Gas overhead of Ethereum Yellow Book Equation 326 to expand memory, the formula is:

xcTn4bUZUtINPfUg3KQch22ivdxwRE6OPmPvVQUe.png

, the overhead of expanding memory for each additional word. where a is the maximum memory location written in a contract call, in 32-byte words. Using 1024 bytes of memory as an example, then a = 32. )

Before writing 1 byte at position 32, our memory is 32 bytes. At this point we start writing to the untouched memory space, and as a result, the memory increases by 32 bytes to 64 bytes. All locations in memory are initially defined as 0, which is why we see 22000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

4. Memory is a byte array

The second thing we might notice during debugging is when we run MLOAD from memory location 33 (0x21). We return the following values to the call stack.

3300000000000000000000000000000000000000000000000000000000000000

Memory reads can start from a non-32-byte element.

Memory is a byte array, which means that reads (and writes) can be started from any memory location. We are not limited to multiples of 32. Memory is linear and can be addressed at the byte level. Memory can only be created in functions. It can be a complex type of new instantiation, such as arrays/structures (for example, by creating a new int[…)] ) or copy from the variable where the reference is stored.

Now that we have some understanding of the data structure, let’s look at the free memory pointer.

5. Free memory pointer

The free memory pointer is simply a pointer to the beginning of free memory. It ensures that smart contracts can track to which memory locations have been written and which have not. This prevents the contract from overwriting some memory that has been allocated to another variable. When a variable is written to memory, the contract will first reference the free memory pointer to determine where the data should be stored. It then updates the free memory pointer by recording the amount of data to be written to the new location. A simple addition of these two values will result in a new free memory starting position.

Location of the free memory pointer + byte size of the data = position of the new free memory pointer

6. Bytecode

As we mentioned earlier, the free memory pointer is defined by these 5 opcodes at run time.

YbAtHPGPnwhcET5JzcfsipNNmjop35nFJZXSi8C9.png

These opcodes declare that the free memory pointer is located at the byte 0x40 in memory (64 in decimal) and the value is 0x80 (128 in decimal).

Solidity’s memory layout reserves four 32-byte slots:

  • 0x00 – 0x3f (64 bytes): Scratch space that can be used between statements, i.e. inline assembly and hashing methods.
  • 0x40 – 0x5f (32 bytes): Free memory pointer, currently allocated memory size, starting position of free memory, initialized to 0x80.
  • 0x60 – 0x7f (32 bytes): Slot 0, used as the initial value of the dynamic memory array, should never be written.

As we can see, 0x40 is a predefined location of the free memory pointer. The value 0x80 is only the first byte of memory that can be written after four 32-byte reserved value slots.

7. Memory in the contract

To build on what we’ve learned so far, we’ll look at how memory and free memory pointers are updated in Solidity code.

We created the MemoryLane contract to demonstrate. The contract’s memoryLane() defines two arrays of lengths 5 and 2, and assigns a 1 of type uint256 to b[0].

tizPznu6hs42ULiNOSs4JKE2ujB5vlvXx9QBZ82Y.png

To see the details of how the contract code executes in EVM, you can copy it into the Remix IDE to compile and deploy the contract. Call memoryLane() and enter DeBug mode to step through the opcode (for the above operations:

https://remix-ide.readthedocs.io/en/latest/tutorial_debug.html)。

Extract the simplified opcode into EVM Playground, This link allows you to view specific opcode and comment information (https://noxx.substack.com/p/evm-deep-dives-the-path-to-shadowy-d6b#:~:text=version%20into%20an-,EVM%20Playground,-and%20will%20run).

Here, the opcode is divided into 6 different parts and interpreted in turn, the JUMP and non-memory operation opcodes are removed, and the comments are added to make it easier to see what is currently being performed.

1) Free memory pointer initialization (EVM Playground opcode lines 1-15)

jV99nWYt2ac7S8uCsTk9Oc1gTylkw49PV9Ob4pjh.png

First, 0x80 (128 decimal) first-in, which is the value specified by the Solidity memory layout, and there is nothing in the current memory.

f7gUqgSOE0lZ01lWA5as5woRGuh4WXBFjtr3lFPM.png

Finally, we call MSTORE, which pops the first item from the stack 0x40 to determine where to write in memory, and takes the second value, 0x80, as what is written. This leaves an empty stack, but a portion of it has been filled into memory. Memory is represented by hexadecimal characters, each of which represents 4 bits. For example, if there are 192 hexadecimal characters in memory, this means that we have 96 bytes (1 byte = 8 bits = 2 hexadecimal characters). If we look back at Solidity’s memory layout, we see that the first 64 bytes will be allocated as scratch space and the next 32 bytes will be used for free memory pointers.

hCIS7gCWglEyHaaHXq0vAhvwhrQsU5lCh1CYtnid.png

2) Memory allocation variable “a” and free memory pointer update (EVM Playground lines 16-34)

5ZTnofEnhjIMMzyIrMy4LNQRob9s2lxbRYeK745L.png

In the next sections, we’ll jump to the end of each section and provide a concise overview.

First, allocate the next memory for the variable “a” (bytes32[5]) and update the free memory pointer. The compiler determines how much space is required by the array size and the default array element size. The elements in the memory array in Solidity are multiples of 32 bytes (this also applies to bytes1[], but bytes and string do not). The memory currently needs to be allocated is 5 * 32 bytes, represented as 160 or 0xa0 (160 in hexadecimal). We can see it being pressed into the stack and added to the current free memory pointer 0x80 (128 in decimal) to get the new free memory pointer value. This returns 0x120 (288 = 128 + 160 in decimal), and we can see that it has been written to the free memory pointer position. The call stack keeps the memory location of the variable ” a ” on the stack 0x80 so that it can be referenced later when needed. 0xffff represents a JUMP (unconditional jump) position that can be ignored because it is independent of memory operations.

ZBugiyOOfd1EjsuwP59sC8BvIv5yUa4qFwsyY8zi.png

3) Memory initialization variable “a” (EVM Playground lines 35-95)

RzUmZuu8cmBuWJ5JjNocPc7sJv3JDklQQCb6lYX1.png

3lMRoS3CN7XKfBiGIcs3WsWeY9oHW7JLt5zSb6rm.png

Now that the memory has been allocated and the free memory pointer has been updated, the memory space needs to be initialized for the variable ” a. Since the variable is simply declared and not assigned, it will be initialized to a zero value.

EVM operates by using the CALLDATACOPY opcode, where there are 3 variables.

  • memoryOffset/destOffset (the memory location to which the data is colocated)
  • calldataOffset/offset (byte offset in calldata that needs to be copied)
  • size/length (the size of bytes to be copied)
  • Expression:memory[destOffset:destOffset+length] = msg.data[offset:offset+length]

In this example, memoryOffset(destOffset) is the memory location of the variable ” a ” ( 0x80 ) . calldataOffset(offset) is the size of the actual calldata, because no calldata needs to be copied, so the initialization memory is zero. Finally, the variable passed in is 0xa0 (160 in decimal).

This is to see that our memory has expanded to 288 bytes (this includes slot 0) and that the calling stack again holds the memory location of the variable and the JUMP address on the stack.

Qhmkpe7TP67ckCOB1Ygtih6rdtXlxXqxrs96gZH1.png

E0iQT4QywxqGKS353xON4wLS2MBOpByVfVsf1oHT.png

This is the same as the memory allocation and free memory pointer update for the variable ” a ” , only this time for “bytes32[2] memory b”. The memory pointer is updated to 0x160 (decimal is 352), which is equal to the previous free memory pointer 288 plus the size of the new variable 64 (in bytes 64). The free memory pointer has been updated to 0x160 in memory , and now has the memory location of the variable ” b ” (0x120) on the stack.

skuf15PdG1qWZrSPZdu4eXCEcnXTtz2o9FiZvYCR.png

tqOa8ppyLafn44Efb0peEYoDLd1nSNeKMSyQVi4E.png

PwRmXeunTAZBcZeuzppvjORB9E7smQTWtlOg9lvq.png

Same as the memory initialization of the variable ” a”. Now that the memory has been increased to 352 bytes, the memory location of the 2 variables is still held on the stack.

H8FgEDCRGoeNpUPnGDgZa3H2FKMI0KuB0tYnFog9.png

gZ3FXwTI0i0RkgeZiNLisL0xV4QV62DgTrigfJyV.png

zLPrduHdGG4iSBspMm5BQs7D3gYgIIWIxte8efFQ.png

Finally, we start assigning a value to the array ” b ” index 0. The code states that the value of b[0] should be 1. The value is pressed into stack 0x01. The next shift to the left occurs, but the input for the shift is 0, which means that our value does not change. Next, the position of the array index to be written to 0x00 is pressed onto the stack and checks whether the value is less than the length of the array 0x02. If not, perform a jump to a different part of the bytecode that handles this error state. The MUL (multiplication) and ADD (add) opcodes are used to determine where the value needs to be written to memory so that it corresponds to the correct array index.

0x20 (32 for 10) * 0x00 (0 for 10) = 0x00

Keep in mind that an in-memory array is a 32-byte element, so the value represents the starting position of the array index. Given that we are writing to index 0, there is no offset, that is, writing from 0x00.

0x00 + 0x120 = 0x120 (288 for 10)

ADD is used to add this offset value to the memory location of the variable ” b ” . With an offset of 0, data is written directly to the allocated memory location. Finally, MSTORE stores the value 0x01 to this memory location 0x120.

The following illustration shows the system state at the end of function execution. All stack items have popped up. Note that there are actually some items left on the stack, a JUMP location, and a function signature in the remix, but they are not related to memory operations and are therefore omitted in the EVM playground.

The memory has been updated to include the b[0] = 1 assignment, and in the penultimate row of our memory, the value 0 becomes 1. You can verify that the value is in the correct memory location, and that b[0] should occupy the location 0x120 – 0x13f (bytes 289 – 320).

Image

We now have a certain level of understanding of how contract memory works. When we need to write code in the future, we will be well understood and helped. When you skip some contract opcodes and see that certain memory locations keep popping up (0x40), you now know exactly what they mean.

In the next article in this series, we’ll dive into Part 3 of the EVM series to dive into how contract storage works, learn about slot packing, and demystify storage slots.

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/evm-deep-dive-part-2/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-10-17 11:55
Next 2022-10-17 11:59

Related articles