py: Add a new dual-VM option to improve performance of sys.settrace#23
py: Add a new dual-VM option to improve performance of sys.settrace#23andrewleech wants to merge 9 commits intoreview/py-settrace-dual-vmfrom
Conversation
Signed-off-by: Damien George <damien@micropython.org>
For testing. Signed-off-by: Damien George <damien@micropython.org>
It can be `undefined` as well. Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Enabling `MICROPY_PY_SYS_SETTRACE` increases the output wasm file by +41k. Enabling `MICROPY_PY_SYS_SETTRACE_DUAL_VM` on top of that adds a further +17k, but performance is regained. In particular, `perf_bench/misc_aes.py` runs 200 times faster with dual-VM enabled compared to having it disabled with `sys.settrace` enabled. Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
So that, after enabling a settrace callback, the outer-most function call will call into the callback when dual-VM is enabled. Signed-off-by: Damien George <damien@micropython.org>
The previous fix to the isTTY check (`!process.stdin.isTTY` instead of `=== false`) correctly handles the `undefined` case when stdin is a pipe, but unconditionally reading stdin overwrites any file contents already loaded from argv. Guard the stdin read with `&& repl` so it only fires when no file arguments were passed on the command line. Signed-off-by: Andrew Leech <andrew.leech@planet-innovation.com> Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
|
The webassembly CI was failing because commit f6f572b ("webassembly/api: Fix CLI.") changed the stdin check from When the test runner does Fixed by adding |
Current-frame tracing: CPython compatibility and enhancement optionsWith dual-VM, If we want to go beyond CPython's default in the future, there are a few options: Option A — Auto-propagate frames on settrace: Walk the Option B — VM switch at branch points: Piggyback on the standard VM's Option C — Expose Option D — No changes (current recommendation): Current behaviour is CPython-compatible. Revisit if a concrete debugger use case requires it. These are documented in |
Summary
MicroPython supports the standard
sys.settrace()facility, to trace bytecode execution. That feature is enabled throughMICROPY_PY_SYS_SETTRACEand is disabled by default because it really slows the VM down, and increases memory use in many places due to the overhead of tracking the state of functions.This PR adds a new option
MICROPY_PY_SYS_SETTRACE_DUAL_VMthat, when enabled, duplicates the VM, themp_execute_bytecode()function:mp_execute_bytecode_standard()and has settrace disabled.mp_execute_bytecode_tracing()and has settrace enabled.Then a wrapper function
mp_execute_bytecode()calls into the appropriate VM depending on whether the user has enabled a settrace callback or not.The result is:
This is useful for, eg, webassembly where PyScript wants to have
sys.settraceenabled, and can afford to have a dual-VM in order not to lose performance with settrace enabled.This PR also includes some minor fixes for the webassembly port to be able to run the performance benchmarks on that port.
Testing
Tested by running the standard test suite on the unix and webassembly ports, which also have dual-VM enabled in CI.
On the webassembly port, the
perf_bench/misc_aes.pyperformance test runs 200 times faster (!) with dual-VM enabled, compared to just enabling settrace without the dual-VM. So it's a very big win there.On PYBV10 the metrics are:
MICROPY_PY_SYS_SETTRACEenabledMICROPY_PY_SYS_SETTRACE_DUAL_VMenabled as wellThe performance measurements on PYBV10 in detail
Enabling
MICROPY_PY_SYS_SETTRACE:Tests that are heavy on the VM (
misc_aes.pyandmisc_pystone.py) run at about half the speed with settrace enabled.Enabling both
MICROPY_PY_SYS_SETTRACEandMICROPY_PY_SYS_SETTRACE_DUAL_VM(comparing to settrace completely disabled):Performance is almost all regained with dual-VM enabled.
Trade-offs and Alternatives
This obviously duplicates the VM, but I tried to make it as simple as possible in the implementation so that it's easy to maintain. In the future this mechanism could potentially be extended to optimise for other things, although I can't think of any obvious candidates for that.
An alternative to having a dual-VM could be to try and optimise the internal VM macros
FRAME_UPDATEandTRACE_TICK. Although that could save code size, no matter how well these macros are optimised it won't be able to reach the performance of a dual-VM architecture.I'm not sure exactly how CPython implements efficient
sys.settracethese days (they've changed a lot recently with their VM), but could look there for alternative ideas as well.