Skip to content

git-co: batch Levenshtein calculations into a single python3 call#12

Merged
mlutonsky merged 2 commits intomainfrom
copilot/batch-levenshtein-calculations
Mar 14, 2026
Merged

git-co: batch Levenshtein calculations into a single python3 call#12
mlutonsky merged 2 commits intomainfrom
copilot/batch-levenshtein-calculations

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 14, 2026

_find_branches_by_levenshtein() spawned one python3 process per branch — O(n) process forks for n branches.

Changes

  • Removed _levenshtein_distance() entirely
  • Rewrote _find_branches_by_levenshtein() to pipe all branch names to a single python3 -c '...' invocation that computes, sorts, and filters all distances at once
echo "$all_branches" | python3 -c '
...
branches = [line.strip() for line in sys.stdin if line.strip()]
results = sorted([(levenshtein(input_str, b), b) for b in branches], key=lambda x: x[0])
for dist, branch in results:
    if dist <= max_dist:
        print(branch)
' "$input" "$max_distance"

Note: The python3 - <<'PYEOF' pattern from the issue spec is broken — the heredoc wins the stdin race and the piped branch data is silently discarded. Using python3 -c '...' keeps the script inline and leaves stdin available for the pipe.

Original prompt

git-co: batch Levenshtein calculations into a single python3 call

File to modify: git-co

Problem

_find_branches_by_levenshtein() calls _levenshtein_distance() once per branch in a while loop. Each call spawns a new python3 process, which is expensive — on a repo with 50 branches this spawns 50 Python processes.

Fix

Rewrite _levenshtein_distance() and _find_branches_by_levenshtein() so that all branch distances are computed in a single python3 invocation.

Replace both functions with the following:

_find_branches_by_levenshtein() {
  local input="$1"
  local all_branches="$2"
  local max_distance="${3:-5}"

  # Compute all Levenshtein distances in a single python3 call
  echo "$all_branches" | python3 - "$input" "$max_distance" <<'PYEOF'
import sys

def levenshtein(s1, s2):
    s1, s2 = s1.lower(), s2.lower()
    if len(s1) < len(s2):
        s1, s2 = s2, s1
    if len(s2) == 0:
        return len(s1)
    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            current_row.append(min(
                previous_row[j + 1] + 1,
                current_row[j] + 1,
                previous_row[j] + (c1 != c2)
            ))
        previous_row = current_row
    return previous_row[-1]

input_str = sys.argv[1]
max_dist = int(sys.argv[2])
branches = [line.strip() for line in sys.stdin if line.strip()]
results = sorted(
    [(levenshtein(input_str, b), b) for b in branches],
    key=lambda x: x[0]
)
for dist, branch in results:
    if dist <= max_dist:
        print(branch)
PYEOF
}

Also remove the now-unused _levenshtein_distance() function entirely.

Update the call site in _find_similar_branches() — the call _find_branches_by_levenshtein "$input" "$all_branches" "$max_distance" remains the same, no changes needed there.

No other changes.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: @copilot Accepted Confirmation: Are you sure?

User: @copilot Accepted Confirmation: Are you sure?

Assistant: The Copilot coding agent is now working on all the pull requests! Here's the full tracking list:

data:
    - title: 'Add MIT license'
      repository: mlutonsky/utils
      url: https://github.com/copilot/tasks/pull/PR_kwDORnE_6M7KoO34
      state: queued
    - title: 'Add ShellCheck CI workflow'
      repository: mlutonsky/utils
      url: https://github.com/copilot/tasks/pull/PR_kwDORnE_6M7KoP2B
      state: queued
    - title: 'idle-power-manager: add log rotation'
      repository: mlutonsky/utils
      url: https://github.com/copilot/tasks/pull/PR_kwDORnE_6M7KoQyr
      state: queued
    - title: 'Add install.sh to symlink all scripts to ~/.local/bin'
      repository: mlutonsky/utils
      url: https://github.com/copilot/tasks/pull/PR_kwDORnE_6M7KoUUn
      state: queued
    - title: 'git-branch-clean: safer branch deletion and fetch error handling'
      repository: mlutonsky/utils
      url: https://github.com/mlutonsky/utils/pulls
      state: queued
    - title: 'git-co: fix expensive per-branch python3 Levenshtein loop'
      repository: mlutonsky/utils
      url: https://github.com/mlutonsky/utils/pulls
      state: queued
    - title: 'git-co: remove unintentional git pull --all after git switch -c'
      repository: mlutonsky/utils
      url: https://github.com/mlutonsky/utils/pulls
      state: queued
    - title: 'git-co: validate branch selection input is numeric'
      repository: mlutonsky/utils
      url: https://github.com/mlutonsky/utils/pulls
      state: queued
    - title: 'idle-power-manager: fix fragile switch_profile success detection'
      repository: mlutonsky/utils
      url: https://github.com/mlutonsky/utils/pulls
      state: queued
    - title: 'idle-power-manager: add DBUS_SESSION_BUS_ADDRESS fallback for systemd'
      repository: mlutonsky/utils
      url: https://github.com/mlutonsky/utils/pulls
      state: queued

All 10 are now running. You can track progress at github.com/mlutonsky/utils/pulls — PRs will appear as the agent completes each one!

User: also re-create pull request for the "Fix git fetch --prune error handling in git-branch-clean (#10)"

Assistant: It seems the agent didn't return a task URL this time. Let me retry:

Assistant: I'm waiting on a tool call confirmation to proceed — please accept it and I'll get that PR queued up right away.

User: @copilot Accepted Confirmation: Are you sure?

Assistant: The agent is on it!

data:
    - title: 'git-branch-clean: abort on git fetch --prune failure'
      repository: mlutonsky/utils
      url: https://github.com/copilot/tasks/pull/PR_kwDORnE_6M7KoVDC
      state: queued

You can track it at github.com/mlutonsky/utils/pulls once the agent opens the PR.

User: @copilot Accepted Con...

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: mlutonsky <423848+mlutonsky@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor Levenshtein distance calculations for branches git-co: batch Levenshtein calculations into a single python3 call Mar 14, 2026
Copilot AI requested a review from mlutonsky March 14, 2026 18:43
@mlutonsky mlutonsky marked this pull request as ready for review March 14, 2026 18:46
Copilot AI review requested due to automatic review settings March 14, 2026 18:46
@mlutonsky mlutonsky merged commit 5905321 into main Mar 14, 2026
2 checks passed
@mlutonsky mlutonsky deleted the copilot/batch-levenshtein-calculations branch March 14, 2026 18:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes git-co’s fuzzy branch matching by batching Levenshtein distance computations into a single Python invocation instead of spawning one python3 process per branch.

Changes:

  • Removed the per-branch _levenshtein_distance() helper.
  • Rewrote _find_branches_by_levenshtein() to compute/sort distances in one python3 -c call fed by stdin.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to 30
echo "$all_branches" | python3 -c '
import sys

def levenshtein(s1, s2):
s1, s2 = s1.lower(), s2.lower()
if len(s1) < len(s2):
Comment on lines +53 to +55
for dist, branch in results:
if dist <= max_dist:
print(branch)
for dist, branch in results:
if dist <= max_dist:
print(branch)
' "$input" "$max_distance"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants