Perf/batch endpoints parallel analyzers#32
Merged
Conversation
…miter Both batch endpoints were missing the `http_request: Request` parameter required by slowapi to extract the client IP for rate limiting, causing a startup crash on deploy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9e3ccf2 to
bc9b362
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves Chrome extension popup and scan performance by (1) running analyzers concurrently during a scan and (2) adding new batch API endpoints to fetch results/status for many extension IDs in one request.
Changes:
- Run the six core analyzers in parallel via a thread pool in
ExtensionAnalyzer.analyze(). - Add
/api/scan/batch-resultsand/api/scan/batch-statusendpoints plus supporting request models. - Add an internal
_lookup_scan_result()helper and opportunistically persist file-loaded results into the DB.
Reviewed changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/extension_shield/core/extension_analyzer.py |
Switches analyzer execution from sequential to parallel via ThreadPoolExecutor, adding timeout/error handling. |
src/extension_shield/api/shared.py |
Adds Pydantic request models for new batch endpoints. |
src/extension_shield/api/main.py |
Adds batch endpoints and shared lookup helper; persists upgraded file results to DB. |
Comments suppressed due to low confidence (5)
src/extension_shield/api/main.py:2704
db.save_scan_result(payload)exceptions are swallowed silently here. Even if this is intentionally best-effort, suppressing all exceptions makes DB misconfigurations and schema errors hard to detect in production. Consider logging at least a debug/warning message with the exception so persistence failures are observable.
# Persist upgraded payload to database so subsequent requests skip file I/O
try:
db.save_scan_result(payload)
except Exception:
pass
src/extension_shield/api/main.py:2774
_lookup_scan_resultreturns DB/file payloads without enforcing the private-report access controls present inget_scan_results(checks forvisibility == "private"anduser_id). This allows the new batch endpoints to leak private scan results if a privateextension_id/upload ID is known. Consider passing the requesteruser_idinto_lookup_scan_resultand applying the same authorization rules (returning 404/not_found when unauthorized).
def _lookup_scan_result(
extension_id: str,
*,
lightweight: bool = False,
) -> Optional[Dict[str, Any]]:
"""
Internal helper: look up a scan result via memory → DB → file.
When *lightweight* is True the expensive ``upgrade_legacy_payload`` step
is skipped — the caller only needs the score / risk-level for badge
rendering (used by the batch endpoint to stay fast).
Returns the payload dict or ``None`` if not found anywhere.
"""
# 1. Memory cache (instant)
payload = scan_results.get(extension_id)
if payload is not None:
if lightweight:
ensure_name_in_payload(payload)
payload["risk_and_signals"] = _extract_risk_and_signals(payload)
return payload
payload = upgrade_legacy_payload(payload, extension_id)
payload = ensure_consumer_insights(payload)
ensure_description_in_meta(payload)
ensure_name_in_payload(payload)
payload["risk_and_signals"] = _extract_risk_and_signals(payload)
scan_results[extension_id] = payload
return payload
# 2. Database
try:
db_row = db.get_scan_result(extension_id)
except Exception:
db_row = None
if db_row:
resolved_id = db_row.get("extension_id") or extension_id
formatted = _hydrate_db_scan_result(db_row, extension_id)
if lightweight:
ensure_name_in_payload(formatted)
formatted["risk_and_signals"] = _extract_risk_and_signals(formatted)
scan_results[resolved_id] = formatted # warm the memory cache
return formatted
payload = upgrade_legacy_payload(formatted, resolved_id)
src/extension_shield/api/main.py:2787
extension_idis used directly to construct a filename (RESULTS_DIR / f"{extension_id}_results.json") without validating that it’s a real Chrome extension ID. With untrusted input this can enable path traversal (e.g.,..segments) and reading arbitrary JSON files on disk. Please validate IDs with_is_extension_id(or otherwise strictly constrain allowed characters) before any filesystem access, and reject/mark invalid IDs as not_found.
# 3. File fallback
result_file = RESULTS_DIR / f"{extension_id}_results.json"
if result_file.exists():
try:
with open(result_file, "r", encoding="utf-8") as f:
payload = json.load(f)
src/extension_shield/api/main.py:2837
batch_scan_resultsaccepts arbitraryextension_idsbut doesn’t validate each ID (format/charset). Since it calls_lookup_scan_resultwhich does filesystem access, invalid IDs should be rejected or treated as not_found to avoid security issues and wasted work. Consider validating every element with_is_extension_idbefore lookup (and returning a clear per-id status for invalid entries).
if len(req.extension_ids) > 50:
raise HTTPException(
status_code=400,
detail="Maximum 50 extension IDs per batch request",
)
# Deduplicate while preserving order
seen: set[str] = set()
unique_ids: list[str] = []
for eid in req.extension_ids:
if eid not in seen:
seen.add(eid)
unique_ids.append(eid)
results: Dict[str, Any] = {}
for ext_id in unique_ids:
payload = _lookup_scan_result(ext_id, lightweight=True)
if payload is None:
src/extension_shield/api/main.py:2873
- New batch endpoints are introduced, but there are existing API tests in
tests/api/test_scan_results_endpoint.pyfor scan-result retrieval behavior. Adding tests for/api/scan/batch-resultsand/api/scan/batch-status(including max-50 enforcement, invalid-id handling, and private/unauthorized result behavior) would prevent regressions and ensure the intended performance path stays safe.
@app.post("/api/scan/batch-results")
@_rate_limit("30/minute")
async def batch_scan_results(req: BatchResultsRequest, request: Request):
"""
Batch lookup of scan results for multiple extensions.
Used by the Chrome extension popup to fetch all installed extension
results in a single HTTP call instead of N individual GET requests.
Returns lightweight payloads (skips expensive legacy upgrades) with
just enough data for badge rendering.
Returns:
Dict mapping extension_id → payload (or {\"_st\": \"not_found\"}).
"""
if len(req.extension_ids) > 50:
raise HTTPException(
status_code=400,
detail="Maximum 50 extension IDs per batch request",
)
# Deduplicate while preserving order
seen: set[str] = set()
unique_ids: list[str] = []
for eid in req.extension_ids:
if eid not in seen:
seen.add(eid)
unique_ids.append(eid)
results: Dict[str, Any] = {}
for ext_id in unique_ids:
payload = _lookup_scan_result(ext_id, lightweight=True)
if payload is None:
results[ext_id] = {"_st": "not_found"}
else:
if payload.get("error"):
payload["error"] = _sanitize_error_for_client(payload["error"])
results[ext_id] = payload
return results
@app.post("/api/scan/batch-status")
@_rate_limit("60/minute")
async def batch_scan_status(req: BatchStatusRequest, request: Request):
"""
Batch lookup of scan statuses for multiple extensions.
Used by the Chrome extension popup to poll all in-progress scans
in a single HTTP call instead of N individual GET requests.
Returns:
Dict with \"statuses\" mapping extension_id → {scanned, status}.
"""
if len(req.extension_ids) > 50:
raise HTTPException(
status_code=400,
detail="Maximum 50 extension IDs per batch request",
)
statuses: Dict[str, Dict[str, Any]] = {}
for ext_id in req.extension_ids:
status = scan_status.get(ext_id)
statuses[ext_id] = {
"scanned": status == "completed",
"status": status or "unknown",
}
return {"statuses": statuses}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.