I was noodling around in the code and discovered that, as far as I can tell, the bytecode interpreter decodes the binary instructions to Instruction when a function is called, not when the bytecode is loaded. This means that every time a function is called it has to re-decode all the instructions. On a benchmark program (fib implemented the dumb way) on Linux AMD64, it spent about 15% of the time in ketos::bytecode::Instruction::decode(), the highest of any function as recorded by perf.
I haven't successfully made it decode the bytecode and store the result only once yet, but once I manage I'll post comparative benchmarks here.