Introduction to Virtual Machines

Nov 2, 2022

Introduction to virtual machines as a model for computation.

Virtual Machines - Wikipedia

In computing, a virtual machine (VM) is an emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer.

Virtual Machines - Wikipedia

system virtual machines - provide substitute for a real machine
process virtual machines - allow executing of computer programs in a platform independent environment

We will be looking only at the second type of virtual machines and VM will mean that for us.

Virtual Machines

Implement a computational model
Just like real machines AKA computers
run on top of the real machine
you can have a hardware implementation of VM

Computational model

Regular Languages
Context Free Languages
Everything that is computable

Computational model

Regular Expressions
Pushdown automata
Turing machines

Turing machine

has infinite tape with cells
each cells can hold one of finite number of values
head that can read, write and move to the left and right
finite set of states, one designated as starting state
finite program that given (state, value) return the new (state, action, value)

Turing machine operation

starts with the initial state
Read the value at the head position
look up (state, value) in the table - if missing HALT
Set state to new state, write new value at the head position and move the head left or right
Go to 2

From theory to practice

Infinite tapes are expensive
Not great user experience to interact via checking the cells on the tape

Still computers and VMs are an implementations of Turing machines.

The tape is not infinite :/

less powerful
we have to reuse the tape

What is inside a machine

Processor - implements the machine running algorithm
- reads, writes to the tape, moves the head and changes the current state
Memory - the tape, consists of cells of variable sizes filled with 0 and 1
- almost always random accessible
Program - the state transition function/table
- resides in the memory - von Neumann architecture
Program Counter (PC) - the current state

All the fancy output like sound and graphics is writing at dedicated addresses in memory.

So to writing something on the screen, used to be write the bytes in 0xF000-0xFFFF and the hardware will just blit them on the screen. Playing a sound - set some address to the frequency of the tone.

How the machine works?

Reads the instruction at PC
Decodes the instruction at PC
Modifies the memory according to the instruction
Moves PC to the next instruction
Go to 1

Instructions

CISC - Complex Instruction Set Computer
- one instruction can do a lot of operations
RISC - Reduced Instruction Set Computer
Register Memory - most operations can use both - registers and memory addresses
Register Register - most operations take place only between registers and there are dedicated instructions that allow loading and storing data in memory

Instructions

x86 - CISC, Register Memory
- the CPU actually runs microcode that is RISC …
ARM - RISC, Register Register

Instructions

transfer data between registers and memory
add, substract, multiply, divide values
boolean operations on values
change the PC
call a subroutine

Subroutines

not required by Turing machines
but very useful, that are part of every machine
Stack for storing return addresses (at least)
Stack Pointer (SP)

“Real” vs Virtual Machine

the “real” machine is implemented in the hardware
the operations of the machine are mapped 1:1 to the hardware actions
these statements used to be true, not any more

{{% note %}} Out-of-order execution, branch prediction -> meltdown, spectre show that our “machine” code runs on a virtual machine. {{% /note %}}

Virtual machines

Stack based
Register based

Stack based VM

all operations work with arguments on the stack
- operands are popped from the stack
- the result is pushed on the stack

    read // from input to top of stack
    dup
loop:
    push 1
    sub
    dup
    push 0
    jmpeq end
    swap
    dup -1
    add
    swap
    jmp loop
end:
    pop
    print // pop result and print

the instructions are smaller and the code is compact
more code is required

Register based VM

operations can work with registers

    read 0
    move 1, 0
loop:
    sub 0, 0, #1
    cmp 0, #0
    jmp end
    add 1, 1, 0
    jmp loop
end:
    print 1

less code
the instructions have arguments and can be larger

Typing

VMs can be statically or dynamically typed

certain registers / memory locations / instructions can work only on certain argument types.

From VM to a interpreter

Implementing everything at the VM level would be too hard

interpreter = VM + runtime
VM - control logic and computations
runtime - language features and standard library

Course future

Implement a register based VM that works with numbers
Implement a runtime
1. Strings
2. Objects
3. Arrays
4. Functions
5. Garbage Collector
Make JSImpl generate code for the VM

Implementing a VM

register based
still needs a stack for argument passing and keeping local variables

Instruction set

choosing an instruction set is difficult
- performance
  - speed
  - energy consumption
- how easy it is to target that instruction set

Fixed size instructions vs variable length instructions:

fixed size is easier to decode, but imposes restrictions and can be wasteful

Instruction set

We’ll be doing variable length instructions

The instruction set is subject to change, so do not start the press (yet!).

Instruction set

Instructions will be between 1 and 25 bytes.

6 bits for OpCode - we have up to 64 opcodes
2 bits - n - length of the arguments
- 2^n bytes per argument
- up to 3 arguments per instruction

Instruction set

https://docs.google.com/spreadsheets/d/1Q90x60BF-7T0jqPngScsjlaGBXXHRx4tszwCTSRYUrA/edit?usp=sharing

Calling convention

push arguments on stack
call function
callee can access arguments from the stack with registers
- 0 - count of arguments
- -1 - first argument
- -n - n-th argument
callee calls ret reg, which removes all locals and arguments and leaves reg on top of the stack
caller pops the result from the stack

How a program looks?

function add_answer(x) {
    return x + 42;
}

func add
const 42 // goes to register 2
add 2, -1, 2
ret 2

Interpreter loop

read the next instruction
decode the opcode and arguments
- move PC to the next instruction
execute the operation
go to 1

while (1) {
    auto instruction = get_op(PC);
    switch (instruction & 0x3F) {
        case ADD: {
            const auto size = (instruction >> 6);
            const auto arg0 = get_arg(PC, size);
            const auto arg1 = get_arg(PC, size);
            const auto arg2 = get_arg(PC, size);
            m_Stack[arg0] = m_Stack[arg1] + m_Stack[arg2];
        }
        // ...
    }
}

int get_op(byte*& PC) {
    return *(PC++);
}
size_t get_arg(byte*& PC, int size) {
    switch (size) {
        case 0:
            return *(PC++);
        case 1:
            return *(((unsigned short*&)PC)++);
        case 2:
            return *(((unsigned int*&)PC)++);
        case 3:
            return *(((size_t*)PC&)++);
    }
}

Homework

implement a brainfuck interpreter
implement a gcd in spasm