The Ethereum virtual machine is a bit different than most other virtual machines out there. In Myself previous post I have already explained how it is used and described some of its features.
The Ethereum Virtual Machine (EVM) is a simple yet powerful 256-bit Turing Complete virtual machine that allows anyone to run EVM bytecode.
The go-ethereum project contains two EVM implementations. A simple and direct bytecode virtual machine and a more sophisticated JIT-VM. In this post, I’ll explain some of the differences between the two implementations and describe some of the features of JIT EVM and why it can be so much faster than bytecode EVM.
Go-ethereum bytecode virtual machine
The internals of the EVM are quite simple; has a single execution loop that will attempt to execute the statement at the current time Program Counter (PC soon). Inside this loop the Gas is computed for each instruction, memory expands if necessary, and executes the instruction if the preamble succeeds. This will continue until the VM terminates successfully or returns with an error throwing an exception (eg. out of gas).
for op = contract(pc) {
if !sufficientGas(op) {
return error("insufficient gas for op:", or)
}
switch op {
case ...:
/* execute */
case RETURN:
return memory(stack(-1), stack(-2))
}
pc++
}
At the end of the execution cycle, the program counter is incremented to execute the next instruction and continues until it is finished.
The EVM has another way of change the counter-program through something called to jump-instructions (TO JUMP & TO JUMP). Instead of allowing the program counter to be incremented (pc++), the EVM can also jump to arbitrary positions in the contract code. The EVM knows of two jump instructions, a normal jump that reads as “jump to X position” and a conditional jump that says “jump to position X if condition Y is true”. When such a jump occurs, you should always land on a jump target. If the program lands on an instruction that is not a jump destination, the program fails; in other words, for a jump to be valid, it must always be followed by a jump destination statement if the condition is true.
Before running any Ethereum program, the EVM iterates over the code and finds all possible hop destinations, then places them on a map that the program counter can reference to find them. Every time the EVM encounters jump instructions, the validity of the jump is checked.
As you can see the executing code is relatively easy and simply interpreted by the bytecode VM, we can even conclude through its sheer simplicity it is actually quite dumb.
Welcome JIT virtual machine
The JIT-EVM takes a different approach to executing the EVM bytecode and is, by definition, initially slower than bytecode virtual machine. Before the virtual machine can execute any code, it must first compile the bytecode into components that the JIT virtual machine can understand.
The initialization and execution procedure is carried out in 3 steps:
- We check if there is a JIT program ready to run using the hash of the code:H(C) is used as an identifier to identify the program;
- if a program was found, we execute it and return the result;
- if no program found, we execute the bytecode Y we compile a JIT program in the background.
Initially, I tried to check if the JIT program had finished compiling and move the execution to the JIT; this all happened during runtime in the same loop using Go’s atomic package – Unfortunately, it turned out to be slower than letting the bytecode VM run and use the JIT program for each sequential call after the program’s compilation had finished.
By compiling the bytecode into logical pieces, the JIT has the ability to analyze the code more precisely and optimize it where and when needed.
For example, one amazing simple optimization I did was to compile multiple Push operation in a single instruction. let’s take the CALL instruction; The call requires 7 insert instructions, i.e. gas, address, value, input offset, input size, return offset, and return size, before executing it, and what I did instead of looping through these 7 instructions, executing them One by one, I’ve optimized this by taking all 7 statements and adding all 7 values in a single chunk. Now, every time he beginning of the 7 push instructions, it instead executes the single optimized instruction by immediately adding the static segment to the VM stack. Now of course this only works for static values (i.e. push 0x10), but these are present in the code quite a bit.
I have also optimized the static jump instructions. Static jumps are jumps that always jump to the same position (i.e., push 0x1, jump) and never change under any circumstances. By determining which jumps are static, we can pre-check if a jump is valid and within contract bounds, and if so, create new instructions that replace both the Push Y to jumpinstruction and is marked as valid. This saves the VM from having to do two instructions and saves it from having to check if the jump is valid and do an expensive hash-map lookup for a valid jump position.
Next steps
Full stack and memory analysis would also fit very well into this model, where large chunks of code could fit into individual instructions. Also I would like to add symbolic execution and convert the JIT to a proper JIT-VM. I think this would be the next logical step once programs are large enough to take advantage of these optimizations.
Conclution
EITHERYour JIT-VM is much smarter than the bytecode VM, but it’s far from completely finished (if it ever was). There are many more nifty tricks we could add with this structure, but they just aren’t realistic at the moment. The execution time is well within the limits of being “reasonably” fast. If the need arises to further optimize the virtual machine, we have the tools to do it.
more code reading
Cross posted from – https://medium.com/@jeff.ethereum/go-ethereums-jit-evm-27ef88277520#.1ed9lj7dz