Skip to content

py: Add a new dual-VM option to improve performance of sys.settrace#23

Draft
andrewleech wants to merge 9 commits intoreview/py-settrace-dual-vmfrom
py-settrace-dual-vm
Draft

py: Add a new dual-VM option to improve performance of sys.settrace#23
andrewleech wants to merge 9 commits intoreview/py-settrace-dual-vmfrom
py-settrace-dual-vm

Conversation

@andrewleech
Copy link
Owner

Summary

MicroPython supports the standard sys.settrace() facility, to trace bytecode execution. That feature is enabled through MICROPY_PY_SYS_SETTRACE and is disabled by default because it really slows the VM down, and increases memory use in many places due to the overhead of tracking the state of functions.

This PR adds a new option MICROPY_PY_SYS_SETTRACE_DUAL_VM that, when enabled, duplicates the VM, the mp_execute_bytecode() function:

  1. One version is called mp_execute_bytecode_standard() and has settrace disabled.
  2. The other version is called mp_execute_bytecode_tracing() and has settrace enabled.

Then a wrapper function mp_execute_bytecode() calls into the appropriate VM depending on whether the user has enabled a settrace callback or not.

The result is:

  1. An increase of about 5k in firmware size (on ARM Cortex-M) when enabling the dual-VM (that's on top of having settrace already enabled).
  2. Almost no performance is lost if a settrace callback is not registered.
  3. Full support for settrace if needed.

This is useful for, eg, webassembly where PyScript wants to have sys.settrace enabled, and can afford to have a dual-VM in order not to lose performance with settrace enabled.

This PR also includes some minor fixes for the webassembly port to be able to run the performance benchmarks on that port.

Testing

Tested by running the standard test suite on the unix and webassembly ports, which also have dual-VM enabled in CI.

On the webassembly port, the perf_bench/misc_aes.py performance test runs 200 times faster (!) with dual-VM enabled, compared to just enabling settrace without the dual-VM. So it's a very big win there.

On PYBV10 the metrics are:

  • code size increases by 14.5k with MICROPY_PY_SYS_SETTRACE enabled
  • code size increases by an extra 5.3k with MICROPY_PY_SYS_SETTRACE_DUAL_VM enabled as well
  • performance is roughly halved when enabling settrace
  • performance is pretty much all regained when enabling dual-VM

The performance measurements on PYBV10 in detail

Enabling MICROPY_PY_SYS_SETTRACE:

diff of scores (higher is better)
N=100 M=100                perf-pyb-std -> perf-pyb-settrace         diff      diff% (error%)
bm_chaos.py                    356.93 ->     261.95 :     -94.98 = -26.610% (+/-0.01%)
bm_fannkuch.py                  72.90 ->      61.28 :     -11.62 = -15.940% (+/-0.00%)
bm_fft.py                     2331.03 ->    1681.29 :    -649.74 = -27.874% (+/-0.02%)
bm_hexiom.py                    45.75 ->      13.35 :     -32.40 = -70.820% (+/-0.00%)
bm_nqueens.py                 4111.33 ->    2695.70 :   -1415.63 = -34.432% (+/-0.00%)
bm_pidigits.py                 663.16 ->     606.83 :     -56.33 =  -8.494% (+/-0.02%)
bm_wordcount.py                 47.36 ->      39.05 :      -8.31 = -17.546% (+/-0.01%)
core_import_mpy_multi.py       640.86 ->     483.22 :    -157.64 = -24.598% (+/-0.01%)
core_import_mpy_single.py      105.97 ->      84.55 :     -21.42 = -20.213% (+/-0.01%)
core_locals.py                  40.28 ->      32.41 :      -7.87 = -19.538% (+/-0.00%)
core_qstr.py                   204.36 ->     192.74 :     -11.62 =  -5.686% (+/-0.00%)
core_str.py                     25.86 ->      22.55 :      -3.31 = -12.800% (+/-0.00%)
core_yield_from.py             355.61 ->       3.84 :    -351.77 = -98.920% (+/-0.00%)
misc_aes.py                    407.85 ->      32.12 :    -375.73 = -92.125% (+/-0.01%)
misc_mandel.py                3166.90 ->    2791.44 :    -375.46 = -11.856% (+/-0.01%)
misc_pystone.py               2378.74 ->     279.43 :   -2099.31 = -88.253% (+/-0.00%)
misc_raytrace.py               361.02 ->     256.57 :    -104.45 = -28.932% (+/-0.01%)
viper_call0.py                 499.49 ->     537.73 :     +38.24 =  +7.656% (+/-0.00%)
viper_call1a.py                522.73 ->     514.72 :      -8.01 =  -1.532% (+/-0.01%)
viper_call1b.py                379.77 ->     385.81 :      +6.04 =  +1.590% (+/-0.00%)
viper_call1c.py                384.11 ->     389.38 :      +5.27 =  +1.372% (+/-0.01%)
viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.00%)
viper_call2b.py                335.06 ->     341.81 :      +6.75 =  +2.015% (+/-0.01%)

Tests that are heavy on the VM (misc_aes.py and misc_pystone.py) run at about half the speed with settrace enabled.

Enabling both MICROPY_PY_SYS_SETTRACE and MICROPY_PY_SYS_SETTRACE_DUAL_VM (comparing to settrace completely disabled):

diff of scores (higher is better)
N=100 M=100                perf-pyb-std -> perf-pyb-settrace-dual-vm         diff      diff% (error%)
bm_chaos.py                    356.93 ->     353.71 :      -3.22 =  -0.902% (+/-0.01%)
bm_fannkuch.py                  72.90 ->      72.78 :      -0.12 =  -0.165% (+/-0.00%)
bm_fft.py                     2331.03 ->    2320.26 :     -10.77 =  -0.462% (+/-0.00%)
bm_float.py                   5713.01 ->    5607.72 :    -105.29 =  -1.843% (+/-0.02%)
bm_hexiom.py                    45.75 ->      45.06 :      -0.69 =  -1.508% (+/-0.00%)
bm_nqueens.py                 4111.33 ->    4035.90 :     -75.43 =  -1.835% (+/-0.00%)
bm_pidigits.py                 663.16 ->     639.93 :     -23.23 =  -3.503% (+/-0.02%)
bm_wordcount.py                 47.36 ->      46.71 :      -0.65 =  -1.372% (+/-0.00%)
core_import_mpy_multi.py       640.86 ->     608.33 :     -32.53 =  -5.076% (+/-0.01%)
core_import_mpy_single.py      105.97 ->     101.29 :      -4.68 =  -4.416% (+/-0.01%)
core_locals.py                  40.28 ->      39.92 :      -0.36 =  -0.894% (+/-0.00%)
core_qstr.py                   204.36 ->     202.55 :      -1.81 =  -0.886% (+/-0.03%)
core_str.py                     25.86 ->      25.71 :      -0.15 =  -0.580% (+/-0.00%)
core_yield_from.py             355.61 ->     328.12 :     -27.49 =  -7.730% (+/-0.01%)
misc_aes.py                    407.85 ->     398.19 :      -9.66 =  -2.369% (+/-0.01%)
misc_mandel.py                3166.90 ->    3148.39 :     -18.51 =  -0.584% (+/-0.01%)
misc_pystone.py               2378.74 ->    2336.25 :     -42.49 =  -1.786% (+/-0.00%)
misc_raytrace.py               361.02 ->     357.93 :      -3.09 =  -0.856% (+/-0.01%)
viper_call0.py                 499.49 ->     495.08 :      -4.41 =  -0.883% (+/-0.01%)
viper_call1a.py                522.73 ->     514.77 :      -7.96 =  -1.523% (+/-0.00%)
viper_call1b.py                379.77 ->     374.68 :      -5.09 =  -1.340% (+/-0.01%)
viper_call1c.py                384.11 ->     378.07 :      -6.04 =  -1.572% (+/-0.00%)
viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.01%)
viper_call2b.py                335.06 ->     332.42 :      -2.64 =  -0.788% (+/-0.00%)

Performance is almost all regained with dual-VM enabled.

Trade-offs and Alternatives

This obviously duplicates the VM, but I tried to make it as simple as possible in the implementation so that it's easy to maintain. In the future this mechanism could potentially be extended to optimise for other things, although I can't think of any obvious candidates for that.

An alternative to having a dual-VM could be to try and optimise the internal VM macros FRAME_UPDATE and TRACE_TICK. Although that could save code size, no matter how well these macros are optimised it won't be able to reach the performance of a dual-VM architecture.

I'm not sure exactly how CPython implements efficient sys.settrace these days (they've changed a lot recently with their VM), but could look there for alternative ideas as well.

dpgeorge and others added 9 commits December 15, 2025 12:35
Signed-off-by: Damien George <damien@micropython.org>
For testing.

Signed-off-by: Damien George <damien@micropython.org>
It can be `undefined` as well.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Enabling `MICROPY_PY_SYS_SETTRACE` increases the output wasm file by +41k.
Enabling `MICROPY_PY_SYS_SETTRACE_DUAL_VM` on top of that adds a further
+17k, but performance is regained.  In particular, `perf_bench/misc_aes.py`
runs 200 times faster with dual-VM enabled compared to having it disabled
with `sys.settrace` enabled.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
So that, after enabling a settrace callback, the outer-most function call
will call into the callback when dual-VM is enabled.

Signed-off-by: Damien George <damien@micropython.org>
The previous fix to the isTTY check (`!process.stdin.isTTY` instead of
`=== false`) correctly handles the `undefined` case when stdin is a pipe,
but unconditionally reading stdin overwrites any file contents already
loaded from argv.  Guard the stdin read with `&& repl` so it only fires
when no file arguments were passed on the command line.

Signed-off-by: Andrew Leech <andrew.leech@planet-innovation.com>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@andrewleech
Copy link
Owner Author

The webassembly CI was failing because commit f6f572b ("webassembly/api: Fix CLI.") changed the stdin check from process.stdin.isTTY === false to !process.stdin.isTTY. The original condition was actually dead code — isTTY is true on a TTY or undefined on a pipe, never literally false. The fix correctly handles the undefined case, but it means the stdin read now fires whenever the binary is invoked via subprocess (pipe), and the = assignment overwrites any file contents already loaded from argv.

When the test runner does subprocess.check_output(["node", "micropython.mjs", "script.py"]), stdin is a pipe with no data, so fs.readFileSync(0) returns "", clobbering the script. Platform detection in run-tests.py gets empty output and fails with ValueError: cannot detect platform.

Fixed by adding && repl to the condition. The repl flag is set false during the argv loop when a file argument is consumed, so stdin is only read as source code when no files were passed on the command line.

@andrewleech
Copy link
Owner Author

Current-frame tracing: CPython compatibility and enhancement options

With dual-VM, sys.settrace(callback) doesn't produce trace events for the currently executing function or anything already on the call stack — tracing only starts at the next function call boundary when the dispatcher selects the tracing VM. This matches CPython's behaviour exactly. CPython's docs state explicitly that settrace() doesn't activate tracing on the current frame; bdb.set_trace() works around this by walking the frame stack and setting frame.f_trace on every frame manually.

If we want to go beyond CPython's default in the future, there are a few options:

Option A — Auto-propagate frames on settrace: Walk the prev_state chain in mp_prof_settrace() and fire synthetic "call" events for all existing frames. Sets up valid frame objects for introspection but doesn't give line events for functions still in the standard VM. Small change in py/profile.c.

Option B — VM switch at branch points: Piggyback on the standard VM's pending_exception_check (runs at backward branches, conditionals, for-iter). When prof_trace_callback is non-NULL, save state and return a new MP_VM_RETURN_SWITCH_VM to the dispatcher, which re-enters the tracing VM. Gives full line-level tracing of existing frames from the next branch point. Zero per-opcode overhead. Would need care to suppress the spurious "call" event when re-entering a mid-execution function. Could also handle the reverse (tracing→standard on settrace(None)).

Option C — Expose frame.f_trace as settable: Let Python code do frame.f_trace = callback, matching CPython's escape hatch. Debuggers would walk the frame stack themselves, identical to bdb.set_trace().

Option D — No changes (current recommendation): Current behaviour is CPython-compatible. Revisit if a concrete debugger use case requires it.

These are documented in docs/settrace-dual-vm-spec.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants