Skip to content

Perf/batch endpoints parallel analyzers#32

Merged
Stanzin7 merged 3 commits intomasterfrom
perf/batch-endpoints-parallel-analyzers
Mar 18, 2026
Merged

Perf/batch endpoints parallel analyzers#32
Stanzin7 merged 3 commits intomasterfrom
perf/batch-endpoints-parallel-analyzers

Conversation

@Stanzin7
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings March 18, 2026 01:54
Stanfordy and others added 2 commits March 17, 2026 20:57
…miter

Both batch endpoints were missing the `http_request: Request` parameter
required by slowapi to extract the client IP for rate limiting, causing
a startup crash on deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Stanzin7 Stanzin7 force-pushed the perf/batch-endpoints-parallel-analyzers branch from 9e3ccf2 to bc9b362 Compare March 18, 2026 01:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Chrome extension popup and scan performance by (1) running analyzers concurrently during a scan and (2) adding new batch API endpoints to fetch results/status for many extension IDs in one request.

Changes:

  • Run the six core analyzers in parallel via a thread pool in ExtensionAnalyzer.analyze().
  • Add /api/scan/batch-results and /api/scan/batch-status endpoints plus supporting request models.
  • Add an internal _lookup_scan_result() helper and opportunistically persist file-loaded results into the DB.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

File Description
src/extension_shield/core/extension_analyzer.py Switches analyzer execution from sequential to parallel via ThreadPoolExecutor, adding timeout/error handling.
src/extension_shield/api/shared.py Adds Pydantic request models for new batch endpoints.
src/extension_shield/api/main.py Adds batch endpoints and shared lookup helper; persists upgraded file results to DB.
Comments suppressed due to low confidence (5)

src/extension_shield/api/main.py:2704

  • db.save_scan_result(payload) exceptions are swallowed silently here. Even if this is intentionally best-effort, suppressing all exceptions makes DB misconfigurations and schema errors hard to detect in production. Consider logging at least a debug/warning message with the exception so persistence failures are observable.
            # Persist upgraded payload to database so subsequent requests skip file I/O
            try:
                db.save_scan_result(payload)
            except Exception:
                pass

src/extension_shield/api/main.py:2774

  • _lookup_scan_result returns DB/file payloads without enforcing the private-report access controls present in get_scan_results (checks for visibility == "private" and user_id). This allows the new batch endpoints to leak private scan results if a private extension_id/upload ID is known. Consider passing the requester user_id into _lookup_scan_result and applying the same authorization rules (returning 404/not_found when unauthorized).
def _lookup_scan_result(
    extension_id: str,
    *,
    lightweight: bool = False,
) -> Optional[Dict[str, Any]]:
    """
    Internal helper: look up a scan result via memory → DB → file.

    When *lightweight* is True the expensive ``upgrade_legacy_payload`` step
    is skipped — the caller only needs the score / risk-level for badge
    rendering (used by the batch endpoint to stay fast).

    Returns the payload dict or ``None`` if not found anywhere.
    """
    # 1. Memory cache (instant)
    payload = scan_results.get(extension_id)
    if payload is not None:
        if lightweight:
            ensure_name_in_payload(payload)
            payload["risk_and_signals"] = _extract_risk_and_signals(payload)
            return payload
        payload = upgrade_legacy_payload(payload, extension_id)
        payload = ensure_consumer_insights(payload)
        ensure_description_in_meta(payload)
        ensure_name_in_payload(payload)
        payload["risk_and_signals"] = _extract_risk_and_signals(payload)
        scan_results[extension_id] = payload
        return payload

    # 2. Database
    try:
        db_row = db.get_scan_result(extension_id)
    except Exception:
        db_row = None

    if db_row:
        resolved_id = db_row.get("extension_id") or extension_id
        formatted = _hydrate_db_scan_result(db_row, extension_id)
        if lightweight:
            ensure_name_in_payload(formatted)
            formatted["risk_and_signals"] = _extract_risk_and_signals(formatted)
            scan_results[resolved_id] = formatted  # warm the memory cache
            return formatted
        payload = upgrade_legacy_payload(formatted, resolved_id)

src/extension_shield/api/main.py:2787

  • extension_id is used directly to construct a filename (RESULTS_DIR / f"{extension_id}_results.json") without validating that it’s a real Chrome extension ID. With untrusted input this can enable path traversal (e.g., .. segments) and reading arbitrary JSON files on disk. Please validate IDs with _is_extension_id (or otherwise strictly constrain allowed characters) before any filesystem access, and reject/mark invalid IDs as not_found.
    # 3. File fallback
    result_file = RESULTS_DIR / f"{extension_id}_results.json"
    if result_file.exists():
        try:
            with open(result_file, "r", encoding="utf-8") as f:
                payload = json.load(f)

src/extension_shield/api/main.py:2837

  • batch_scan_results accepts arbitrary extension_ids but doesn’t validate each ID (format/charset). Since it calls _lookup_scan_result which does filesystem access, invalid IDs should be rejected or treated as not_found to avoid security issues and wasted work. Consider validating every element with _is_extension_id before lookup (and returning a clear per-id status for invalid entries).
    if len(req.extension_ids) > 50:
        raise HTTPException(
            status_code=400,
            detail="Maximum 50 extension IDs per batch request",
        )

    # Deduplicate while preserving order
    seen: set[str] = set()
    unique_ids: list[str] = []
    for eid in req.extension_ids:
        if eid not in seen:
            seen.add(eid)
            unique_ids.append(eid)

    results: Dict[str, Any] = {}
    for ext_id in unique_ids:
        payload = _lookup_scan_result(ext_id, lightweight=True)
        if payload is None:

src/extension_shield/api/main.py:2873

  • New batch endpoints are introduced, but there are existing API tests in tests/api/test_scan_results_endpoint.py for scan-result retrieval behavior. Adding tests for /api/scan/batch-results and /api/scan/batch-status (including max-50 enforcement, invalid-id handling, and private/unauthorized result behavior) would prevent regressions and ensure the intended performance path stays safe.
@app.post("/api/scan/batch-results")
@_rate_limit("30/minute")
async def batch_scan_results(req: BatchResultsRequest, request: Request):
    """
    Batch lookup of scan results for multiple extensions.

    Used by the Chrome extension popup to fetch all installed extension
    results in a single HTTP call instead of N individual GET requests.
    Returns lightweight payloads (skips expensive legacy upgrades) with
    just enough data for badge rendering.

    Returns:
        Dict mapping extension_id → payload (or {\"_st\": \"not_found\"}).
    """
    if len(req.extension_ids) > 50:
        raise HTTPException(
            status_code=400,
            detail="Maximum 50 extension IDs per batch request",
        )

    # Deduplicate while preserving order
    seen: set[str] = set()
    unique_ids: list[str] = []
    for eid in req.extension_ids:
        if eid not in seen:
            seen.add(eid)
            unique_ids.append(eid)

    results: Dict[str, Any] = {}
    for ext_id in unique_ids:
        payload = _lookup_scan_result(ext_id, lightweight=True)
        if payload is None:
            results[ext_id] = {"_st": "not_found"}
        else:
            if payload.get("error"):
                payload["error"] = _sanitize_error_for_client(payload["error"])
            results[ext_id] = payload

    return results


@app.post("/api/scan/batch-status")
@_rate_limit("60/minute")
async def batch_scan_status(req: BatchStatusRequest, request: Request):
    """
    Batch lookup of scan statuses for multiple extensions.

    Used by the Chrome extension popup to poll all in-progress scans
    in a single HTTP call instead of N individual GET requests.

    Returns:
        Dict with \"statuses\" mapping extension_id → {scanned, status}.
    """
    if len(req.extension_ids) > 50:
        raise HTTPException(
            status_code=400,
            detail="Maximum 50 extension IDs per batch request",
        )

    statuses: Dict[str, Dict[str, Any]] = {}
    for ext_id in req.extension_ids:
        status = scan_status.get(ext_id)
        statuses[ext_id] = {
            "scanned": status == "completed",
            "status": status or "unknown",
        }

    return {"statuses": statuses}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Stanzin7 Stanzin7 merged commit e68fdf6 into master Mar 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants