Add flat/cumulative function summary to handle recursion-heavy profiles#430
Conversation
…profiles Signed-off-by: Nguyen Huy Hoang <181364121+huyhoang171106@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new console rendering mode intended to make recursion-heavy profiles easier to interpret by providing an aggregated “top functions” table (self vs cumulative/total time) instead of the deep call-tree view.
Changes:
- Add a
show_topoption toConsoleRendererand enforce mutual exclusion withflat. - Route rendering through a new
render_top_summary()function whenshow_top=True. - Add time formatting helper
_format_top_time()for seconds vs percent-of-total output.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -41,6 +42,7 @@ def __init__( | |||
| :param unicode: Use unicode, like box-drawing characters in the output. | |||
| :param color: Enable color support, using ANSI color sequences. | |||
| :param flat: Display a flat profile instead of a call graph. | |||
| :param show_top: Display an aggregated top-functions table. | |||
| :param time: How to display the duration of each frame - ``'seconds'`` or ``'percent_of_total'`` | |||
| :param flat_time: Show ``'self'`` time or ``'total'`` time (including children) in flat profile. | |||
| :param short_mode: Display a short version of the output. | |||
| @@ -52,13 +54,17 @@ def __init__( | |||
| self.unicode = unicode | |||
| self.color = color | |||
| self.flat = flat | |||
| self.show_top = show_top | |||
| self.time = time | |||
| self.flat_time = flat_time | |||
| self.short_mode = short_mode | |||
|
|
|||
| if self.flat and self.timeline: | |||
| raise Renderer.MisconfigurationError("Cannot use timeline and flat options together.") | |||
|
|
|||
| if self.flat and self.show_top: | |||
| raise Renderer.MisconfigurationError("Cannot use flat and show_top options together.") | |||
|
|
|||
There was a problem hiding this comment.
A new show_top option was added to ConsoleRenderer, but it isn’t currently exposed through Profiler.print() / Profiler.output_text() (and the CLI option plumbing appears absent). This will also break test_profiler_convenience_methods_have_all_options_available (it asserts Profiler convenience methods mirror renderer signatures). Add show_top (with matching default/annotation) to the Profiler convenience methods and any option parsing that constructs a ConsoleRenderer.
| sorted_rows = sorted( | ||
| aggregated.items(), key=lambda item: (item[1]["total_time"], item[1]["self_time"]), reverse=True | ||
| ) | ||
|
|
||
| if not sorted_rows: | ||
| return f"{indent}No samples were recorded.\n" | ||
|
|
||
| header = f"{indent}{'Self':>10} {'Total':>10} Function" | ||
| lines = [header] | ||
|
|
||
| for (function, location), times in sorted_rows: | ||
| self_time = self._format_top_time(times["self_time"], frame.time, precision) | ||
| total_time = self._format_top_time(times["total_time"], frame.time, precision) | ||
| label = function if not location else f"{function} ({location})" | ||
| lines.append(f"{indent}{self_time:>10} {total_time:>10} {label}") | ||
|
|
There was a problem hiding this comment.
render_top_summary() prints every aggregated row unconditionally. For large profiles this can produce extremely large outputs, and it also ignores the existing show_all behavior used by render_frame_flat() (0.1% threshold). Consider adding a default filter/limit (e.g., drop entries below a % threshold or show top N) and let show_all=True disable that filter.
| if self.show_top: | ||
| result += self.render_top_summary(self.root_frame, precision=precision, indent=indent) | ||
| elif self.flat: | ||
| result += self.render_frame_flat(self.root_frame, precision=precision) | ||
| else: |
There was a problem hiding this comment.
The new show_top rendering path isn’t covered by tests (existing tests cover empty profiles, timeline, show_all, and flat modes). Add a renderer test that exercises ConsoleRenderer(show_top=True) and asserts the output format/ordering (at least that it doesn’t crash and includes the expected header).
| while frames: | ||
| current = frames.pop() | ||
| function = current.function or "<unknown>" | ||
| location = current.code_position_short or "" |
There was a problem hiding this comment.
Frame.code_position_short is a method, but here it’s being used as an attribute (current.code_position_short). This will make location a bound-method object (always truthy), producing unusable keys/labels (e.g., printing the method repr) and breaking aggregation. Call current.code_position_short() instead.
| location = current.code_position_short or "" | |
| location = current.code_position_short() or "" |
|
|
||
| aggregated[key]["self_time"] += current.total_self_time | ||
| aggregated[key]["total_time"] += current.time | ||
|
|
There was a problem hiding this comment.
total_time is being aggregated by summing current.time for every frame occurrence. For recursive call stacks, inclusive times overlap across nested invocations, so this will significantly over-count (and can exceed the root total), making the new “top functions” view misleading for the recursion-heavy use case this PR targets. Consider defining “total” in a way that avoids double-counting recursion (e.g., subtract time from recursive children with the same key, or compute cumulative time on a call graph with cycle handling, similar to pprof).
Summary
The issue is not a crash or incorrect sampling; it’s a usability gap in how results are presented for recursive call trees. Pyinstrument currently emphasizes hierarchical (tree) output, which makes recursive stacks visually deep and makes it hard to estimate total time spent in a function across many recursive invocations, or in callees beneath those recursive frames. The missing capability is an aggregated “top functions” view (flat/self time and cumulative/inclusive time by function identity), similar to pprof’s top table.
Files changed
pyinstrument/renderers/console.py(modified)Testing
Closes #293