Skip to content

fix(deployment_manager): harden and optimize concurrent GitHub file uploads#68

Draft
Copilot wants to merge 2 commits intooptimize-github-deployment-5034492779168511499from
copilot/sub-pr-60
Draft

fix(deployment_manager): harden and optimize concurrent GitHub file uploads#68
Copilot wants to merge 2 commits intooptimize-github-deployment-5034492779168511499from
copilot/sub-pr-60

Conversation

Copy link
Contributor

Copilot AI commented Mar 14, 2026

_upload_to_github had several correctness and robustness gaps introduced when migrating to httpx: silent auth failures, blocking file I/O on the event loop, swallowed CancelledError, no rate-limit handling, and unbounded coroutine creation for large repos.

Changes

  • Auth validation – Check user_response.status_code != 200 before parsing JSON; raises a descriptive exception instead of a KeyError on login
  • Non-blocking file readsopen().read()await asyncio.to_thread(file_path.read_bytes) to keep the event loop free during disk I/O
  • CancelledError propagation – Explicit except asyncio.CancelledError: raise before the broad except Exception, so task cancellation (e.g. server shutdown) is not silently swallowed
  • Rate-limit retry/backoff – On HTTP 403/429, retries up to 3 times with exponential backoff; honors Retry-After header with a safe fallback for non-integer values; no wasted sleep after the final attempt; exhausted retries are logged as errors (not silently None)
  • Bounded queue-based worker pool – Replaces Semaphore(10) + gather(all_coroutines) with a proper producer/consumer pattern:
    • asyncio.Queue(maxsize=20) prevents unbounded memory growth for large repos
    • Workers started before queue fill for immediate consumption
    • try/finally ensures queue.task_done() is always called
    • gather(..., return_exceptions=True) prevents one worker error from cancelling siblings
# Before: O(files) coroutine objects created up-front
results = await asyncio.gather(*(run_with_semaphore(task) for task in upload_tasks))

# After: O(workers) coroutines, bounded queue, retry-aware
queue: asyncio.Queue = asyncio.Queue(maxsize=MAX_WORKERS * 2)
worker_tasks = [asyncio.create_task(worker()) for _ in range(MAX_WORKERS)]
# ... fill queue ...
results = await asyncio.gather(*worker_tasks, return_exceptions=True)

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@vercel
Copy link
Contributor

vercel bot commented Mar 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
v0-uvai Ready Ready Preview, Comment, Open in v0 Mar 14, 2026 5:07pm

@railway-app railway-app bot temporarily deployed to EventRelay / EventRelay-pr-68 March 14, 2026 16:56 Destroyed
@railway-app
Copy link

railway-app bot commented Mar 14, 2026

🚅 Deployed to the EventRelay-pr-68 environment in EventRelay

Service Status Web Updated (UTC)
eslint-config ✅ Success (View Logs) Web Mar 14, 2026 at 5:10 pm
EventRelay ✅ Success (View Logs) Web Mar 14, 2026 at 5:10 pm
web ✅ Success (View Logs) Web Mar 14, 2026 at 5:10 pm
supabase ✅ Success (View Logs) Mar 14, 2026 at 5:10 pm
guides ✅ Success (View Logs) Web Mar 14, 2026 at 5:10 pm

…_github

- Check user_response.status_code before parsing JSON to surface auth errors
- Use asyncio.to_thread for file reads to avoid blocking the event loop
- Re-raise asyncio.CancelledError to allow proper task cancellation
- Add retry/backoff (3 retries, exponential) for 403/429 rate limit responses
- Wrap Retry-After header parsing in try/except for non-integer values
- Replace semaphore+gather with bounded queue-based worker pool (maxsize=20)
- Start workers before filling queue for immediate consumption
- Add try/finally in worker to guarantee queue.task_done() is always called
- Use return_exceptions=True in gather to prevent one error cancelling all workers"

Co-authored-by: groupthinking <154503486+groupthinking@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/user
    • Triggering command: /usr/bin/python python -m pytest tests/testing/test_deployment_pipeline.py -v (http block)
    • Triggering command: /usr/bin/python python -m pytest tests/testing/test_deployment_pipeline.py::TestDeploymentPipelineIntegration::test_deployment_manager_orchestration -v (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title [WIP] Fix performance issue with GitHub deployment calls fix(deployment_manager): harden and optimize concurrent GitHub file uploads Mar 14, 2026
Copilot AI requested a review from groupthinking March 14, 2026 17:07
logger.error(f"Exhausted retries uploading {relative_path} due to rate limiting")
return None
except asyncio.CancelledError:
# Propagate cancellation so calling code can handle shutdown correctly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker tasks can fail to call queue.task_done() when cancelled at a race condition boundary between queue.get() success and entering the try/finally block

Fix on Vercel

await queue.put(item)
# Sentinel values to signal each worker to stop
for _ in range(MAX_WORKERS):
await queue.put(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing await queue.join() call in worker pool pattern causes improper synchronization between item processing and worker shutdown

Fix on Vercel

response = await client.put(upload_url, headers=headers, json=file_data)
if response.status_code in [200, 201]:
return str(relative_path)
elif response.status_code in [403, 429]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry-After header with RFC 7231 datetime format is not parsed correctly, causing fallback to exponential backoff instead of respecting the server's specified retry time

Fix on Vercel

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens and improves the robustness of GitHub repository uploads in DeploymentManager._upload_to_github, primarily addressing async correctness (cancellation + non-blocking I/O) and operational behavior (rate limiting + bounded concurrency).

Changes:

  • Add explicit auth status validation before parsing /user response JSON.
  • Replace blocking file reads with asyncio.to_thread(...), add rate-limit retry/backoff, and switch to a queue-based worker pool for bounded concurrency.
  • Regenerate/update *.egg-info packaging metadata (appears incidental to the deployment-manager change).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
src/youtube_extension/backend/deployment_manager.py Improves GitHub upload resilience (auth check, non-blocking reads, retry/backoff, worker pool).
src/youtube_extension.egg-info/requires.txt Updates generated dependency metadata (should not be committed).
src/youtube_extension.egg-info/SOURCES.txt Updates generated source manifest (should not be committed).
src/youtube_extension.egg-info/PKG-INFO Updates generated long description/metadata (should not be committed; also contains conflicting runtime claims).
Comments suppressed due to low confidence (1)

src/youtube_extension.egg-info/requires.txt:60

  • PR description focuses on hardening _upload_to_github, but this PR also includes regenerated *.egg-info metadata (dependencies + long description changes). Unless the intent is to change packaging metadata, these files should be removed from the PR to keep the change set scoped and reviewable.
google-genai>=1.0.0
google-generativeai>=0.3.0
opentelemetry-distro>=0.40b0
opentelemetry-exporter-otlp>=1.20.0
psutil>=5.9.0
ffmpeg-python>=0.2.0
qrcode[pil]>=7.0
opencv-python>=4.8.0

[deploy]
docker>=6.1.0
kubernetes>=28.1.0
helm>=3.12.0
flyctl>=0.1.0

[dev]
pytest>=7.4.0
pytest-asyncio>=0.21.0
pytest-cov>=4.1.0
pytest-xdist>=3.5.0
pytest-mock>=3.12.0
black>=23.0.0
isort>=5.12.0
flake8>=6.0.0
mypy>=1.7.0
pre-commit>=3.5.0
ruff>=0.1.0
jupyter>=1.0.0
ipykernel>=6.25.0
notebook>=7.0.0

[docs]
mkdocs>=1.5.0

Comment on lines 25 to 31
PyYAML>=6.0.0
requests>=2.31.0
Pillow>=10.0.0
google-genai>=1.0.0
google-generativeai>=0.3.0
opentelemetry-distro>=0.40b0
opentelemetry-exporter-otlp>=1.20.0
src/integration/supabase_db.py
src/integration/temporal_video_analysis.py
src/integration/vercel_deploy.py
src/integration/youtube_api.py
@documentation Document this API endpoint
@documentation Create a setup guide for new contributors
```
- Python >= 3.11
Comment on lines +596 to +636
# Retry with exponential backoff on 403/429 (rate limit / abuse detection).
# max_retries=3 means 1 initial attempt + 3 retries = 4 total requests.
max_retries = 3
for attempt in range(max_retries + 1):
response = await client.put(upload_url, headers=headers, json=file_data)
if response.status_code in [200, 201]:
return str(relative_path)
elif response.status_code in [403, 429]:
if attempt == max_retries:
break # exhausted all retries, no point sleeping
try:
retry_after = int(response.headers.get("Retry-After", 2 ** (attempt + 1)))
except (ValueError, TypeError):
retry_after = 2 ** (attempt + 1)
logger.warning(
f"Rate limited uploading {relative_path} "
f"(attempt {attempt + 1}/{max_retries + 1}), retrying in {retry_after}s"
)
await asyncio.sleep(retry_after)
else:
logger.warning(
f"Failed to upload {relative_path}: "
f"{response.status_code} - {response.text}"
)
return None

logger.error(f"Exhausted retries uploading {relative_path} due to rate limiting")
return None
except asyncio.CancelledError:
# Propagate cancellation so calling code can handle shutdown correctly
raise
except Exception as e:
logger.warning(f"Error uploading {file_path}: {e}")
return None

# Collect tasks
for file_path in project_path_obj.rglob("*"):
# Skip excluded directories and dotfiles
if should_skip_path(file_path.relative_to(project_path_obj)):
continue
if file_path.is_file() and not file_path.name.startswith('.'):
relative_path = file_path.relative_to(project_path_obj)
upload_tasks.append(upload_file(client, file_path, relative_path))

# Run uploads concurrently with a semaphore to avoid overwhelming the GitHub API
# Secondary rate limit for GitHub is generally not strictly documented for concurrent writes but 10-20 concurrent requests is a safe maximum.
semaphore = asyncio.Semaphore(10)

async def run_with_semaphore(coro):
async with semaphore:
return await coro

results = await asyncio.gather(*(run_with_semaphore(task) for task in upload_tasks))
uploaded_files = [res for res in results if res is not None]
# Bounded worker-pool: only MAX_WORKERS coroutines exist at a time, keeping
# memory usage O(workers) rather than O(files) for large repositories.
# maxsize=MAX_WORKERS*2 keeps the queue bounded so producers don't outpace consumers.
MAX_WORKERS = 10
queue: asyncio.Queue = asyncio.Queue(maxsize=MAX_WORKERS * 2)
uploaded_files: list[str] = []
Comment on lines +573 to +580
# Collect file list up-front; coroutines are created lazily inside workers
files_to_upload = [
(file_path, file_path.relative_to(project_path_obj))
for file_path in project_path_obj.rglob("*")
if not should_skip_path(file_path.relative_to(project_path_obj))
and file_path.is_file()
and not file_path.name.startswith('.')
]
Comment on lines 589 to 595
file_data = {
"message": f"Add {relative_path}",
"content": encoded_content
}

upload_url = f"https://api.github.com/repos/{username}/{repo_name}/contents/{relative_path}"
response = await client.put(upload_url, headers=headers, json=file_data)

Comment on lines +596 to +616
# Retry with exponential backoff on 403/429 (rate limit / abuse detection).
# max_retries=3 means 1 initial attempt + 3 retries = 4 total requests.
max_retries = 3
for attempt in range(max_retries + 1):
response = await client.put(upload_url, headers=headers, json=file_data)
if response.status_code in [200, 201]:
return str(relative_path)
elif response.status_code in [403, 429]:
if attempt == max_retries:
break # exhausted all retries, no point sleeping
try:
retry_after = int(response.headers.get("Retry-After", 2 ** (attempt + 1)))
except (ValueError, TypeError):
retry_after = 2 ** (attempt + 1)
logger.warning(
f"Rate limited uploading {relative_path} "
f"(attempt {attempt + 1}/{max_retries + 1}), retrying in {retry_after}s"
)
await asyncio.sleep(retry_after)
else:
logger.warning(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants