Perf/batch endpoints parallel analyzers by Stanzin7 · Pull Request #32 · Stanzin7/ExtensionShield

Stanzin7 · 2026-03-18T01:54:51Z

No description provided.

…miter Both batch endpoints were missing the `http_request: Request` parameter required by slowapi to extract the client IP for rate limiting, causing a startup crash on deploy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves Chrome extension popup and scan performance by (1) running analyzers concurrently during a scan and (2) adding new batch API endpoints to fetch results/status for many extension IDs in one request.

Changes:

Run the six core analyzers in parallel via a thread pool in ExtensionAnalyzer.analyze().
Add /api/scan/batch-results and /api/scan/batch-status endpoints plus supporting request models.
Add an internal _lookup_scan_result() helper and opportunistically persist file-loaded results into the DB.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

File	Description
`src/extension_shield/core/extension_analyzer.py`	Switches analyzer execution from sequential to parallel via `ThreadPoolExecutor`, adding timeout/error handling.
`src/extension_shield/api/shared.py`	Adds Pydantic request models for new batch endpoints.
`src/extension_shield/api/main.py`	Adds batch endpoints and shared lookup helper; persists upgraded file results to DB.

Comments suppressed due to low confidence (5)

src/extension_shield/api/main.py:2704

db.save_scan_result(payload) exceptions are swallowed silently here. Even if this is intentionally best-effort, suppressing all exceptions makes DB misconfigurations and schema errors hard to detect in production. Consider logging at least a debug/warning message with the exception so persistence failures are observable.

            # Persist upgraded payload to database so subsequent requests skip file I/O
            try:
                db.save_scan_result(payload)
            except Exception:
                pass

src/extension_shield/api/main.py:2774

_lookup_scan_result returns DB/file payloads without enforcing the private-report access controls present in get_scan_results (checks for visibility == "private" and user_id). This allows the new batch endpoints to leak private scan results if a private extension_id/upload ID is known. Consider passing the requester user_id into _lookup_scan_result and applying the same authorization rules (returning 404/not_found when unauthorized).

def _lookup_scan_result(
    extension_id: str,
    *,
    lightweight: bool = False,
) -> Optional[Dict[str, Any]]:
    """
    Internal helper: look up a scan result via memory → DB → file.

    When *lightweight* is True the expensive ``upgrade_legacy_payload`` step
    is skipped — the caller only needs the score / risk-level for badge
    rendering (used by the batch endpoint to stay fast).

    Returns the payload dict or ``None`` if not found anywhere.
    """
    # 1. Memory cache (instant)
    payload = scan_results.get(extension_id)
    if payload is not None:
        if lightweight:
            ensure_name_in_payload(payload)
            payload["risk_and_signals"] = _extract_risk_and_signals(payload)
            return payload
        payload = upgrade_legacy_payload(payload, extension_id)
        payload = ensure_consumer_insights(payload)
        ensure_description_in_meta(payload)
        ensure_name_in_payload(payload)
        payload["risk_and_signals"] = _extract_risk_and_signals(payload)
        scan_results[extension_id] = payload
        return payload

    # 2. Database
    try:
        db_row = db.get_scan_result(extension_id)
    except Exception:
        db_row = None

    if db_row:
        resolved_id = db_row.get("extension_id") or extension_id
        formatted = _hydrate_db_scan_result(db_row, extension_id)
        if lightweight:
            ensure_name_in_payload(formatted)
            formatted["risk_and_signals"] = _extract_risk_and_signals(formatted)
            scan_results[resolved_id] = formatted  # warm the memory cache
            return formatted
        payload = upgrade_legacy_payload(formatted, resolved_id)

src/extension_shield/api/main.py:2787

extension_id is used directly to construct a filename (RESULTS_DIR / f"{extension_id}_results.json") without validating that it’s a real Chrome extension ID. With untrusted input this can enable path traversal (e.g., .. segments) and reading arbitrary JSON files on disk. Please validate IDs with _is_extension_id (or otherwise strictly constrain allowed characters) before any filesystem access, and reject/mark invalid IDs as not_found.

    # 3. File fallback
    result_file = RESULTS_DIR / f"{extension_id}_results.json"
    if result_file.exists():
        try:
            with open(result_file, "r", encoding="utf-8") as f:
                payload = json.load(f)

src/extension_shield/api/main.py:2837

batch_scan_results accepts arbitrary extension_ids but doesn’t validate each ID (format/charset). Since it calls _lookup_scan_result which does filesystem access, invalid IDs should be rejected or treated as not_found to avoid security issues and wasted work. Consider validating every element with _is_extension_id before lookup (and returning a clear per-id status for invalid entries).

    if len(req.extension_ids) > 50:
        raise HTTPException(
            status_code=400,
            detail="Maximum 50 extension IDs per batch request",
        )

    # Deduplicate while preserving order
    seen: set[str] = set()
    unique_ids: list[str] = []
    for eid in req.extension_ids:
        if eid not in seen:
            seen.add(eid)
            unique_ids.append(eid)

    results: Dict[str, Any] = {}
    for ext_id in unique_ids:
        payload = _lookup_scan_result(ext_id, lightweight=True)
        if payload is None:

src/extension_shield/api/main.py:2873

New batch endpoints are introduced, but there are existing API tests in tests/api/test_scan_results_endpoint.py for scan-result retrieval behavior. Adding tests for /api/scan/batch-results and /api/scan/batch-status (including max-50 enforcement, invalid-id handling, and private/unauthorized result behavior) would prevent regressions and ensure the intended performance path stays safe.

@app.post("/api/scan/batch-results")
@_rate_limit("30/minute")
async def batch_scan_results(req: BatchResultsRequest, request: Request):
    """
    Batch lookup of scan results for multiple extensions.

    Used by the Chrome extension popup to fetch all installed extension
    results in a single HTTP call instead of N individual GET requests.
    Returns lightweight payloads (skips expensive legacy upgrades) with
    just enough data for badge rendering.

    Returns:
        Dict mapping extension_id → payload (or {\"_st\": \"not_found\"}).
    """
    if len(req.extension_ids) > 50:
        raise HTTPException(
            status_code=400,
            detail="Maximum 50 extension IDs per batch request",
        )

    # Deduplicate while preserving order
    seen: set[str] = set()
    unique_ids: list[str] = []
    for eid in req.extension_ids:
        if eid not in seen:
            seen.add(eid)
            unique_ids.append(eid)

    results: Dict[str, Any] = {}
    for ext_id in unique_ids:
        payload = _lookup_scan_result(ext_id, lightweight=True)
        if payload is None:
            results[ext_id] = {"_st": "not_found"}
        else:
            if payload.get("error"):
                payload["error"] = _sanitize_error_for_client(payload["error"])
            results[ext_id] = payload

    return results


@app.post("/api/scan/batch-status")
@_rate_limit("60/minute")
async def batch_scan_status(req: BatchStatusRequest, request: Request):
    """
    Batch lookup of scan statuses for multiple extensions.

    Used by the Chrome extension popup to poll all in-progress scans
    in a single HTTP call instead of N individual GET requests.

    Returns:
        Dict with \"statuses\" mapping extension_id → {scanned, status}.
    """
    if len(req.extension_ids) > 50:
        raise HTTPException(
            status_code=400,
            detail="Maximum 50 extension IDs per batch request",
        )

    statuses: Dict[str, Dict[str, Any]] = {}
    for ext_id in req.extension_ids:
        status = scan_status.get(ext_id)
        statuses[ext_id] = {
            "scanned": status == "completed",
            "status": status or "unknown",
        }

    return {"statuses": statuses}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Made-with: Cursor

Copilot AI review requested due to automatic review settings March 18, 2026 01:54

Copilot started reviewing on behalf of Stanzin7 March 18, 2026 01:55 View session

Stanfordy and others added 2 commits March 17, 2026 20:57

fix(api): add request param to batch endpoints for slowapi rate limiting

bc9b362

Stanzin7 force-pushed the perf/batch-endpoints-parallel-analyzers branch from 9e3ccf2 to bc9b362 Compare March 18, 2026 01:57

Copilot AI reviewed Mar 18, 2026

View reviewed changes

chore(api): harden /health for Railway healthchecks

4247e17

Made-with: Cursor

Stanzin7 merged commit e68fdf6 into master Mar 18, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf/batch endpoints parallel analyzers#32

Perf/batch endpoints parallel analyzers#32
Stanzin7 merged 3 commits intomasterfrom
perf/batch-endpoints-parallel-analyzers

Stanzin7 commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Stanzin7 commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants