When the evalfilter package is executed a user-supplied script is lexed, parsed, and transformed into a series of bytecode operations, then these bytecode operations are executed by a simple stack-based virtual machine.
Although we don't expect users to care about the implementation details here are some brief notes.
The opcodes we're discussing are found in code/code.go, and the interpreter for the virtual machine is contained in vm/vm.go.
- Bytecode
- Bytecode Overview
- Mathematical Operations
- Comparison Operations
- Control-Flow Operations
- Iteration Operations
- Misc Operations
- Function Calls
- Program Walkthrough
- Debugging
- Optimization
You may use the cmd/evalfilter
utility to view the bytecode representation of a program.
For example consider the following script:
if ( 1 == 0.5 * 2 ) {
return true;
}
print( "This is weird\n" );
return false;
To view the bytecode you would run evalfilter bytecode ./simple.txt
, which would produce a result similar to this:
Bytecode:
0000 OpPush 1 // Push 1 to stack
0003 OpConstant 0 // push constant onto stack: "0.5"
0006 OpPush 2 // Push 2 to stack
0009 OpMul
0010 OpEqual
0011 OpJumpIfFalse 16
0014 OpTrue
0015 OpReturn
0016 OpConstant 1 // push constant onto stack: "This is weird\n"
0019 OpConstant 2 // push constant onto stack: "print"
0022 OpCall 1 // call function with 1 arg(s)
0025 OpFalse
0026 OpReturn
Constant Pool:
0000 Type:FLOAT Value:"0.5"
0001 Type:STRING Value:"This is weird\n"
0002 Type:STRING Value:"print"
Our bytecode interpreter understands approximately 30 different instructions, which are broadly grouped into categories:
- Constant, and field operations.
- Mathematical operations.
- Comparison operations.
- Control-flow operations.
- Misc operations.
The virtual machine I've implemented needs two things to work:
- A list of instructions to execute.
- A list of constants.
- If the user defines functions, via
function name(args) { body }
- There will be a section of bytecode instructions for each such function defintion.
When it comes to constants it is worth nothing that constants are used in a lot of places, for example the program "print( 1.0 + 2.0 ); return true;
" contains three constants:
- The name of the function
print
. - The floating point value
1.0
. - The floating point value
2.0
.- See optimization for details of why this example uses floating-point numbers.
That program would be encoded like so:
Bytecode:
0000 OpConstant 0 // push constant onto stack: "1"
0003 OpConstant 1 // push constant onto stack: "2"
0006 OpAdd
0007 OpConstant 2 // push constant onto stack: "print"
0010 OpCall 1 // call function with 1 arg(s)
0013 OpTrue
0014 OpReturn
Constant Pool:
0000 Type:FLOAT Value:"1"
0001 Type:FLOAT Value:"2"
0002 Type:STRING Value:"print"
To provide more details about the output:
- The value to the left of the instruction is the position in the code.
- The control-flow instructions generate jumps to these indexes, so they're worth showing.
- The middle field is the instruction to be executed.
- Some instructions include an argument, but most do not.
- Some instructions contain helpful comments to the right.
- After the bytecode has been disassembled you'll see the list of constants.
- Each of which is identified by numeric ID.
In this overview we're focusing upon the instruction OpConstant
. The OpConstant
instruction has a single argument, which is the index of the constant to load and push on the stack.
When we start running the program the stack is empty.
- We run
OpConstant 0
- That loads the constant with ID
0
from the constant-area.- This is the floating-point number
1.0
.
- This is the floating-point number
- The constant is then pushed upon the stack.
- That loads the constant with ID
- We run
OpConstant 1
- That loads the constant with ID
0
from the constant-area.- This is the floating-point number
2.0
.
- This is the floating-point number
- The constant is then pushed upon the stack.
- That loads the constant with ID
- We then execute the
OpAdd
instruction.- This pops two values from the stack
- i.e. The
2.0
we just added. - Then the
1.0
we added.- The stack is emptied in reverse.
- i.e. The
- The two values are added, producing a result of
3.0
. - Then the value is placed back upon the stack.
- This pops two values from the stack
The mathematical-related opcodes all work in the same way, they each pop two operands from the stack, carry out the appropriate instruction, and then push the result back upon the stack.
We saw these described briefly earlier, but the full list of instructions is:
OpAdd
- Add two numbers.
OpSub
- Subtract a number from another
OpMul
- Multiply two numbers.
OpDiv
- Divide a number by another.
OpMod
- Calculate a modulus operation
OpPower
- Raise a number to the power of another.
There are two "maths-like" operations which we also allocate an opcode instruction to:
OpDec
- Decrease the contents of the given variable by one.
OpInc
- Increase the contents of the given variable by one.
These two opcodes take a pointer to a variable name, fetch the value, adjust it, and store it back. The major difference here is that these instructions do not return anything, since they don't push anything onto the stack they essentially work via side-effects.
Adding support for these operations only requires that the object.XXX
structure implement the object.Decrement
and object.Increment
interfaces.
The comparison operations are very similar to the mathematical operations, and work in the same way:
- They pop two values from the stack.
- They run the comparison operation:
- If the comparison succeeds they push
true
upon the stack. - Otherwise they push
false
.
- If the comparison succeeds they push
Comparison operations include:
OpEqual
- This pushes
true
upon the stack if the two values to be compared are equal. false
otherwise.
- This pushes
OpLess
- This pushes
true
upon the stack if the first argument is less than the second. false
otherwise.
- This pushes
The full list is:
OpLess
/<
OpLessEqual
/<=
OpGreater
/>
OpGreaterEqual
/>=
OpEqual
/==
OpNotEqual
/!=
OpMatches
/~=
OpNotMatches
/!~
OpArrayIn
/in
- This is an array-specific opcode which tests whether a value is contained within an array.
In addition to this there is an OpCase
instruction which is designed to handle case
statements. This is implemented to allow case-matches to match either:
- Static constants (e.g. "
1
", or "Subject
"). - The result of executing expressions.
- Regular Expressions.
There are two control-flow operations for adjusting the instruction-pointer within the bytecode interpreter:
OpJump
- Which takes the offset within the bytecode to jump to.
- Control flow immediately branches there.
OpJumpIfFalse
- A value is popped from the stack, if it is false then control moves to the offset specified as the argument.
- Otherwise we proceed to the next instruction as expected.
There is support for iterating over things, at the moment we have support for iterating over the contents of arrays, and the characters within strings.
This is implemented via a pair of opcodes:
OpIterationReset
- Reset state of the object being iterated over.
OpIterationNext
- Get the next thing from the object on the stack being iterated over.
There's a lot of magic in the compiler/vm to make this work, as both pieces need to know how the stack is setup. That's not so unusual but care will need to be taken if either is changed alone.
There are some miscellaneous instructions:
OpBang
- Calculate negation.
OpMinus
- Calculate unary minus.
OpSquareRoot
- Calculate a square root.
OpTrue
- Pushes a
true
value to the stack.
- Pushes a
OpFalse
- Pushes a
false
value to the stack.
- Pushes a
OpVoid
- Pushes a
void
value to the stack.
- Pushes a
OpReturn
- Pops a value off the stack and terminates processing.
- The value taken from the stack is the return-code.
- Pops a value off the stack and terminates processing.
OpLookup
- Much like loading a constant by reference this loads the value from the structure field with the given name.
OpLocal
- Declare that the associated variable-name is "local" in scope, rather than global.
- This is used to handle variables marked with
local
.
OpCall
- Pops the name of a function to call from the stack.
- Called with an argument noting how many arguments to pass to the function, and pops that many arguments from the stack to use in the function-call.
There are several built-in functions supplied with the interpreter, such as len()
, print()
, and similar. Your host application which embeds the library can install more easily too.
The prototype of all functions is:
func foo( args []object.Object ) object.Object { .. }
i.e. All functions take an array of objects, and return a single object. The objects allow recieving or returning arrays, strings, numbers, booleans, and errors.
NOTE: In addition to these functions, implemented in golang, a user can define their own functions within the language itself. On the calling-side there is no difference between a built-in function, and a user-defined function. However built-in functions always take precedence, if the user were to define a function named printf
it would never be invoked, the built-in function with that name would be invoked instead.
We've already seen how constants can be loaded from the constant area onto the stack, and that along with the OpCall
instruction is all we need to support calling functions.
The OpCall
instruction comes with a single operand, which is the number of arguments that should be supplied to the function, these arguments will be popped off the stack as you should have come to expect.
This is an excerpt from the program we saw at the top of this document, and shows a function being called:
000019 OpConstant 3 // load constant: &{This is weird\n}
000022 OpConstant 4 // load constant: &{print}
000025 OpCall 1 // call function with 1 arguments
000028 OpFalse
- The first operation loads the constant with ID 3 and pushes it onto the stack.
- This is the string "
This is weird\n
", as the comment indicates.
- This is the string "
- The second instruction loads the constant with ID 4 and pushes it onto the stack.
- This is the string "
print
", which is the name of the function we're going to invoke.
- This is the string "
- The third instruction is
OpCall 1
which means that the machine should call a function with one argument.
The end result of that is that the function call happens:
OpCall
pops the first value off the stack.- This is the function to invoke.
- i.e.
print
.
- The argument to
OpCall
is the number of arguments to supply to that function.- There is one argument in this example.
- So one value is popped off the stack.
- This will be the string
This is weird\n
.
- This will be the string
- Now that the arguments are handled the function is invoked.
- The return result from that call is then pushed onto the stack.
- Unless the function returns
Void
in which case the result is empty and ignored.
- Unless the function returns
We already demonstrated a simple program earlier, with the following bytecode:
Bytecode:
000000 OpConstant 0 // load constant: &{1}
000003 OpConstant 1 // load constant: &{0.5}
000006 OpConstant 2 // load constant: &{2}
000009 OpMul
000010 OpEqual
000011 OpJumpIfFalse 19
000014 OpTrue
000015 OpReturn
000016 OpJump 19
000019 OpConstant 3 // load constant: &{This is weird\n}
000022 OpConstant 4 // load constant: &{print}
000025 OpCall 1 // call function with 1 arguments
000028 OpFalse
000029 OpReturn
Now we'll walk through what happens:
- We load the value of the constant with ID 0 and push to the stack.
- The stack now looks like this: [1]
- We load the value of the constant with ID 1 and push to the stack.
- The stack now looks like this: [1 0.5]
- We load the value of the constant with ID 2 and push to the stack.
- The stack now looks like this: [1 0.5 2]
- We come across the
OpMul
instruction, which pops two numbers from the stack, multiples them, and adds the result back.- The stack now looks like this: [1 1]
- The second value there is the result of
0.5 * 2
.
- The second value there is the result of
- The stack now looks like this: [1 1]
- We see the
Equal
instruction, which pops two items from the stack, and compares them.1
and1
are equal so the result istrue
.- The stack now looks like this: [true]
- We see the
OpJumpIfFalse
instruction, which pops a value from the stack and jumps if that is false.- Since the value on the stack is
true
the jump is not taken. - The stack now looks like this: []
- Since the value on the stack is
- We now see the
OpTrue
instruction.- This pushes the value
true
onto the stack. - The stack now looks like this: [true]
- This pushes the value
- We see the
OpReturn
instruction.- This pops a value from the stack, and terminates execution.
- The stack now looks like this: []
Seeing the dump of bytecode, as shown above, is useful but it still can be hard to reason about the run-time behaviour.
For this reason it is possible to output each opcode before it is executed, as well as view the current state of the stack.
To show this debug-output simply invoke the run
sub-command with the -debug
flag:
$ evalfilter run -debug ./path/to/script
The virtual machine performs some simple optimizations when it is constructed.
The optimizations are naive, but are designed to simplify the bytecode which is intepreted. There are a few distinct steps which are taken, although precise details will vary over time.
-
Mathematical operations which only refer to integers will be collapsed
- i.e. The statement
if ( 1 + 2 == 3 ) { ...
will be converted toif ( true ) { ..
- Because the condition is provably always true.
- i.e. The statement
-
Jump statements (i.e. the opcode instructions
OpJump
andOpJumpIfFalse
) will be removed if appropriate.- In the case of a jump which is never taken
if ( false ) { ..
the code will be removed.- This code wouldn't be written by a user, but could be generated via the first optimization.
- In the case of a jump which is never taken
-
If a program contains no jump operations, and a OpReturn instruction is encounted the program will be truncated.
- For example the program
return true; print( "What?"); return false;
will be truncated to becomereturn true;
because nothing after that can execute.
- For example the program