Skip to content

UPSTREAM PR #2459: fix(tree): deduplicate entry-finding logic#27

Open
loci-dev wants to merge 2 commits intomainfrom
loci/pr-2459-unify-traversal
Open

UPSTREAM PR #2459: fix(tree): deduplicate entry-finding logic#27
loci-dev wants to merge 2 commits intomainfrom
loci/pr-2459-unify-traversal

Conversation

@loci-dev
Copy link
Copy Markdown

@loci-dev loci-dev commented Mar 7, 2026

Note

Source pull request: GitoxideLabs/gitoxide#2459

I was poking around in the tree iteration code and saw an opportunity for some deduplication.

I'm not sure if the end result is worth it: a lot of generics and fairly opaque logic (looking at the direct implementation of Tree::peel_to_entry is... Not very satisfying :P), but I figured I'd put this here in case it is something that can be merged.

I'm not sure if the pub fn lookup_entry adheres to the style guide, but I'm more than willing to find a better solution if that is all that is in the way of fixing this.

@loci-review
Copy link
Copy Markdown

loci-review bot commented Mar 7, 2026

Flame Graph: target.aarch64-unknown-linux-gnu.release.gix::ZN10gix_object4tree8ref_iter41$LT$impl$u20$gix_object..TreeRefIter$GT$12lookup_entry

Flame Graph: target.aarch64-unknown-linux-gnu.release.gix::ZN10gix_object4tree8ref_iter41$LT$impl$u20$gix_object..TreeRefIter$GT$12lookup_entry

The target version shows 4 instances of iter_next calls followed by nextextract_from_bytesmemchrbcmp sequences (~280 ns each), replacing the base version's simpler linear call chain. The doubling of response time is directly attributable to this iteration overhead from the refactoring.

Additional Findings

Synchronization Overhead: The diff function regression stems from new thread-safety mechanisms (lock operations, atomic operations, hash table synchronization) introduced in downstream functions—not from the tree refactoring itself. This suggests architectural changes in the object database or caching layers that warrant investigation.

Cross-Function Impact: The lookup_entry regression propagates through call chains affecting status, diff, and checkout operations. In large repositories with thousands of path resolutions, the 1,861 ns overhead per call accumulates to noticeable user-facing impact.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

The logic for finding entries is duplicated 3 times.

With an (unhealthy) smothering of generics, the common
functionality can be extracted into a single function
that can be reused in all places.
Instead of doing some messed up callback stuff, we can
also use std::ops::ControlFlow.

It is slightly more verbose, but definitely easier to
read.
@loci-review
Copy link
Copy Markdown

loci-review bot commented Mar 8, 2026

Overview

This analysis covers 29,429 functions (1,132 modified, 3,016 new, 3,032 removed) across two binaries following tree path-finding deduplication refactoring. Power consumption changes are negligible: target.aarch64-unknown-linux-gnu.release.gix (+0.059%), target.aarch64-unknown-linux-gnu.release.ein (-0.018%).

Net Impact: Minor (Positive). Core tree traversal functions improved 5-49%, with compiler optimizations delivering unexpected wins up to 97%. External dependency regressions (serde_json +994%, BTreeMap +334%) affect non-critical paths only.

Function Analysis

Performance-Critical Improvements:

  • peel_to_entry_by_path (gix): Response time -48.8% (2,086ns → 1,067ns). Replaced generic path iteration with specialized iter_next() helper, eliminating 605ns of overhead from removed functions.
  • entries (gitoxide_core): Response time -5.4% (248,923ns → 235,499ns). Compiler inlined refactored tree traversal, replacing callback-based approach with specialized inline implementations.

Acceptable Trade-off:

  • lookup_entry (gix-object): Response time +104.2% (1,756ns → 3,586ns). Extracted inline path-parsing into iter_next() helper, introducing function call overhead but eliminating ~30 lines of duplicated code. Could be mitigated with inline hints.

Compiler Optimization Wins:

  • fmt (Display, gix): Response time -97.7% (422ns → 10ns). Compiler eliminated expensive core::fmt::write call and simplified control flow from 8 to 3 blocks.
  • fmt (Debug, ein): Response time -73.0% (14,282ns → 3,851ns). Eliminated recursive Debug calls and complex padding operations.

External Dependency Regressions (Non-Critical):

  • serialize_field (serde_json): Response time +994% (12,896ns → 141,087ns). External serde_json library update changed serialization structure. Used only for CLI JSON output, not performance-critical paths.
  • BTreeMap::remove (stdlib): Response time +334% (603ns → 2,619ns). Rust stdlib regression outside Gitoxide's control. Used in delta decompression; impact depends on workload.

Other analyzed functions showed minor changes from compiler optimizations or external dependencies with negligible real-world impact.

Additional Findings

The refactoring successfully improved code quality (eliminated ~60 lines of duplication, cleaner control flow) while delivering net positive performance. The lookup_entry regression warrants monitoring but represents a deliberate maintainability trade-off. BTreeMap regression in delta decompression should be monitored on large pack operations.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 4 times, most recently from 167bdd1 to 3deba97 Compare March 15, 2026 07:48
@loci-dev loci-dev force-pushed the main branch 5 times, most recently from 0223bcb to bfaee00 Compare March 24, 2026 07:50
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from a9e7940 to d9d8ccd Compare March 28, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants