-
Notifications
You must be signed in to change notification settings - Fork 1
⚡ Performance improvement for GitHub deployment calls #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1538e17
72056f6
5a409f5
a5d2471
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,7 +19,7 @@ | |||||||||||||||||
| from pathlib import Path | ||||||||||||||||||
| from typing import Any, Optional | ||||||||||||||||||
|
|
||||||||||||||||||
| import requests | ||||||||||||||||||
| import httpx | ||||||||||||||||||
|
|
||||||||||||||||||
| from youtube_extension.backend.deploy import deploy_project as _adapter_deploy | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
@@ -493,43 +493,45 @@ async def _create_github_repository(self, repo_name: str, project_config: dict[s | |||||||||||||||||
| "Accept": "application/vnd.github.v3+json" | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| # Get user info | ||||||||||||||||||
| user_response = requests.get("https://api.github.com/user", headers=headers) | ||||||||||||||||||
| if user_response.status_code != 200: | ||||||||||||||||||
| raise Exception(f"Failed to get GitHub user info: {user_response.text}") | ||||||||||||||||||
|
|
||||||||||||||||||
| user_data = user_response.json() | ||||||||||||||||||
| username = user_data["login"] | ||||||||||||||||||
|
|
||||||||||||||||||
| # Create repository | ||||||||||||||||||
| repo_data = { | ||||||||||||||||||
| "name": repo_name, | ||||||||||||||||||
| "description": f"Generated by UVAI from YouTube tutorial - {project_config.get('title', 'Unknown')}", | ||||||||||||||||||
| "private": False, | ||||||||||||||||||
| "auto_init": True, | ||||||||||||||||||
| "has_issues": True, | ||||||||||||||||||
| "has_projects": True, | ||||||||||||||||||
| "has_wiki": False | ||||||||||||||||||
| } | ||||||||||||||||||
| # Use httpx.AsyncClient for non-blocking HTTP requests | ||||||||||||||||||
| async with httpx.AsyncClient() as client: | ||||||||||||||||||
| # Get user info | ||||||||||||||||||
| user_response = await client.get("https://api.github.com/user", headers=headers) | ||||||||||||||||||
| if user_response.status_code != 200: | ||||||||||||||||||
| raise Exception(f"Failed to get GitHub user info: {user_response.text}") | ||||||||||||||||||
|
Comment on lines
+499
to
+501
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of manually checking the status code and raising a generic
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| user_data = user_response.json() | ||||||||||||||||||
| username = user_data["login"] | ||||||||||||||||||
|
|
||||||||||||||||||
| # Create repository | ||||||||||||||||||
| repo_data = { | ||||||||||||||||||
| "name": repo_name, | ||||||||||||||||||
| "description": f"Generated by UVAI from YouTube tutorial - {project_config.get('title', 'Unknown')}", | ||||||||||||||||||
| "private": False, | ||||||||||||||||||
| "auto_init": True, | ||||||||||||||||||
| "has_issues": True, | ||||||||||||||||||
| "has_projects": True, | ||||||||||||||||||
| "has_wiki": False | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| response = requests.post( | ||||||||||||||||||
| "https://api.github.com/user/repos", | ||||||||||||||||||
| headers=headers, | ||||||||||||||||||
| json=repo_data | ||||||||||||||||||
| ) | ||||||||||||||||||
| response = await client.post( | ||||||||||||||||||
| "https://api.github.com/user/repos", | ||||||||||||||||||
| headers=headers, | ||||||||||||||||||
| json=repo_data | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| if response.status_code not in [201, 422]: # 422 if repo already exists | ||||||||||||||||||
| raise Exception(f"Failed to create GitHub repository: {response.text}") | ||||||||||||||||||
| if response.status_code not in [201, 422]: # 422 if repo already exists | ||||||||||||||||||
| raise Exception(f"Failed to create GitHub repository: {response.text}") | ||||||||||||||||||
|
|
||||||||||||||||||
| if response.status_code == 422: | ||||||||||||||||||
| # Repository already exists, get its info | ||||||||||||||||||
| repo_response = requests.get(f"https://api.github.com/repos/{username}/{repo_name}", headers=headers) | ||||||||||||||||||
| if repo_response.status_code == 200: | ||||||||||||||||||
| repo_info = repo_response.json() | ||||||||||||||||||
| if response.status_code == 422: | ||||||||||||||||||
| # Repository already exists, get its info | ||||||||||||||||||
| repo_response = await client.get(f"https://api.github.com/repos/{username}/{repo_name}", headers=headers) | ||||||||||||||||||
| if repo_response.status_code == 200: | ||||||||||||||||||
| repo_info = repo_response.json() | ||||||||||||||||||
| else: | ||||||||||||||||||
| raise Exception(f"Repository exists but can't access it: {repo_response.text}") | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| else: | ||||||||||||||||||
| raise Exception(f"Repository exists but can't access it: {repo_response.text}") | ||||||||||||||||||
| else: | ||||||||||||||||||
| repo_info = response.json() | ||||||||||||||||||
| repo_info = response.json() | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
|
|
||||||||||||||||||
| return { | ||||||||||||||||||
| "repo_name": repo_name, | ||||||||||||||||||
|
|
@@ -549,30 +551,27 @@ async def _upload_to_github(self, project_path: str, repo_name: str) -> dict[str | |||||||||||||||||
| "Accept": "application/vnd.github.v3+json" | ||||||||||||||||||
| } | ||||||||||||||||||
|
|
||||||||||||||||||
| # Get user info | ||||||||||||||||||
| user_response = requests.get("https://api.github.com/user", headers=headers) | ||||||||||||||||||
| user_data = user_response.json() | ||||||||||||||||||
| username = user_data["login"] | ||||||||||||||||||
| async with httpx.AsyncClient() as client: | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
| # Get user info | ||||||||||||||||||
| user_response = await client.get("https://api.github.com/user", headers=headers) | ||||||||||||||||||
|
||||||||||||||||||
| user_response = await client.get("https://api.github.com/user", headers=headers) | |
| user_response = await client.get("https://api.github.com/user", headers=headers) | |
| if user_response.status_code != 200: | |
| raise Exception( | |
| f"GitHub authentication failed: {user_response.status_code} - {user_response.text}" | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The _upload_to_github function does not check the HTTP status code after a GitHub API call, which can cause a KeyError on failure instead of a clear error message.
Severity: MEDIUM
Suggested Fix
Before parsing the JSON response in _upload_to_github, add a check for the HTTP status code. If user_response.status_code is not 200, raise an exception with a descriptive error message, similar to the implementation in _create_github_repository. For example: if user_response.status_code != 200: raise Exception(f"Failed to get GitHub user info: {user_response.text}").
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/youtube_extension/backend/deployment_manager.py#L556-L558
Potential issue: The `_upload_to_github` function calls the GitHub API to get user info
but fails to check the HTTP status code of the response before parsing it. If the API
call fails (e.g., due to an invalid token, rate limiting, or server issues), the
response will be an error object without a `"login"` key. Attempting to access
`user_data["login"]` will then raise a `KeyError`. This is inconsistent with the
`_create_github_repository` function, which correctly handles this case. While the
exception is caught upstream, the resulting error message is a misleading `KeyError`
instead of a clear indication of the API failure.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repository's style guide requires strict type hinting for all functions. The inner helper function upload_file is missing type hints. Adding them will improve code clarity and maintainability.
| async def upload_file(client, file_path, relative_path): | |
| async def upload_file(client: httpx.AsyncClient, file_path: Path, relative_path: Path) -> Optional[str]: |
References
- The style guide requires all functions to have strict type hinting. (link)
Copilot
AI
Mar 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload_file() performs synchronous disk I/O (open(...).read()) inside an async function. With many uploads this can still block the event loop and reduce the concurrency gains from switching to httpx. Consider moving the file read/encode to a thread (e.g., asyncio.to_thread) or using an async file reader so HTTP awaits aren’t delayed by filesystem reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upload_file function reads the entire content of each file into memory using f.read(), which, especially with concurrent uploads, can lead to significant memory consumption and a Denial of Service (DoS) vulnerability if large files are processed. Additionally, this synchronous file operation blocks the asyncio event loop, undermining the performance gains from using an async HTTP client. Consider using aiofiles for non-blocking file I/O to address both the performance bottleneck and the potential memory exhaustion issue.
| content = f.read() | |
| import aiofiles | |
| async with aiofiles.open(file_path, 'rb') as f: | |
| content = await f.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative_path of files is concatenated directly into the GitHub API URL without URL encoding. If a filename contains special characters such as ?, #, or spaces, they will not be properly handled by the API, potentially leading to URL manipulation. For example, a filename containing ?branch=main could inadvertently target a different branch than intended. Additionally, on Windows systems, the path separator \ will be used, which is not compatible with the GitHub API's expected / separator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot
AI
Mar 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload_file() catches broad Exception, which will also catch asyncio.CancelledError and prevent task cancellation from propagating (e.g., during request shutdown). Re-raise CancelledError explicitly and only swallow/log non-cancellation exceptions.
| return None | |
| return None | |
| except asyncio.CancelledError: | |
| # Propagate cancellation so that calling code can handle shutdown correctly | |
| raise |
Copilot
AI
Mar 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With uploads now running concurrently, hitting GitHub secondary rate limits (403) or abuse detection is more likely. Currently failures are only logged and the deployment can still report success with missing files. Consider detecting 403/429 responses and applying retry/backoff (or failing the GitHub deployment) so callers don’t get a "success" repo that’s only partially uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _upload_to_github method is vulnerable to path traversal because it uses the project_path argument directly to read files from the filesystem. If project_path is influenced by user input (e.g., via the project_type field in the API), an attacker can use traversal sequences like ../../ to point it to sensitive system directories (e.g., /etc). The system will then recursively read and upload all accessible files from that directory to a GitHub repository, leading to critical data exfiltration.
Copilot
AI
Mar 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concurrent upload implementation builds upload_tasks for every file and then gates execution with a semaphore. For large repos this can still create thousands of coroutine objects at once and increase memory pressure. Consider a bounded producer/consumer (worker pool pulling from a queue) so you only keep ~N in-flight uploads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repository's style guide requires strict type hinting for all functions. The inner helper function run_with_semaphore is missing type hints. Adding them will improve code clarity and maintainability.
| async def run_with_semaphore(coro): | |
| async def run_with_semaphore(coro: asyncio.Coroutine) -> Any: |
References
- The style guide requires all functions to have strict type hinting. (link)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
httpx.AsyncClient instances created without explicit timeout configuration