Skip to content

feat: vulnerability scanning within git integration (IN-956)#3892

Open
epipav wants to merge 14 commits intomainfrom
feat/git-osv-vulnerabilities
Open

feat: vulnerability scanning within git integration (IN-956)#3892
epipav wants to merge 14 commits intomainfrom
feat/git-osv-vulnerabilities

Conversation

@epipav
Copy link
Collaborator

@epipav epipav commented Mar 3, 2026

Adds automated vulnerability scanning for all git repositories using the Google OSV Scanner SDK. Runs on the first clone batch per repo and persists results directly to the insights database.

Architecture
Go binary wrapped in Python — OSV Scanner is a Go library with no Python bindings. We embed it as an SDK dependency and call it programmatically, following the same subprocess + JSON stdout pattern as the software-value service.

The binary exits with code 0 and communicates errors through the JSON payload, so the Python subprocess machinery never misinterprets a non-zero exit as a crash.

Design decisions
Vulnerability identity: (repo_url, vulnerability_id, package_name, source_path) — same CVE can appear in multiple packages and lockfiles

ID classification: primary ID + aliases sorted into cve_ids, ghsa_ids, other_ids arrays by prefix

Severity: derived from CVSS numeric score using standard thresholds (CRITICAL/HIGH/MEDIUM/LOW)

Status tracking: OPEN (no fix known), FIX_AVAILABLE (patch exists), RESOLVED (no longer detected)

Database strategy: upsert + mark-resolved (not delete + insert) — preserves full history of when vulnerabilities were first detected, last seen, and resolved

Transitive scanning: resolves full dependency graph by default; falls back to direct-only on timeout (3min) for first scans; subsequent scans reuse the previous mode

OOM handling: on any scanner crash, marks stale running scan records as failure; on OOM specifically (SIGKILL), retries with --no-transitive to skip the most memory-intensive part

Scan tracking: every invocation creates a vulnerability_scans row (running → success/failure/no_packages_found) with duration, counts, and errors


Note

Medium Risk
Introduces a new scanning subprocess and database write path (insights vulnerability_scans/vulnerabilities) in the main repo-processing pipeline; failures/timeouts could impact processing latency and DB load if not tuned.

Overview
Adds automated vulnerability scanning to the git integration worker by introducing a new VulnerabilityScannerService that runs on the first clone batch and records a ServiceExecution with a new OperationType.VULNERABILITY_SCAN.

Implements the scanner as a new Go binary (services/.../vulnerability_scanner) built into the Docker image; it uses the Google OSV Scanner SDK, writes scan and vulnerability upserts directly to the insights database (including resolved/new counts), and supports timeout/OOM handling via exit codes and a --no-transitive retry path.

Updates infra/config to support the insights DB connection (INSIGHTS_DB_* env vars) and enhances run_shell_command / CommandExecutionError to propagate process return codes and optionally stream stderr for better scanner logs.

Written by Cursor Bugbot for commit b4e9df5. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav marked this pull request as draft March 3, 2026 16:44
@epipav epipav marked this pull request as ready for review March 5, 2026 14:48
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav requested a review from mbani01 March 5, 2026 14:52
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav
Copy link
Collaborator Author

epipav commented Mar 5, 2026

related: linuxfoundation/insights#1725

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav changed the title feat: vulnerability scanning within git integration feat: vulnerability scanning within git integration (IN-956) Mar 5, 2026
errorCode := getErrorCodeFromConfigError(err)
errorMessage := err.Error()
return StandardResponse{Status: StatusFailure, ErrorCode: &errorCode, ErrorMessage: &errorMessage}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config error checked after config is already used

Medium Severity

getConfig returns (config, err), but the code iterates over os.Args and sets config.NoTransitive on lines 28-31 before checking err on line 33. If getConfig fails (e.g., target path doesn't exist), the partially-initialized config is modified before the error is handled. The error check needs to happen immediately after the getConfig call, before the --no-transitive flag parsing loop.

Fix in Cursor Fix in Web

coro = _run_with_stderr_logging()
stdout = await (asyncio.wait_for(coro, timeout=timeout) if timeout else coro)
stdout_text = _safe_decode(stdout).strip() if stdout else ""
stderr_text = "\n".join(stderr_lines)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdin_input silently dropped when stderr_logger is provided

Medium Severity

When stderr_logger is provided, the _run_with_stderr_logging path reads stdout and streams stderr but never writes stdin_input to the process. The process.communicate(input=stdin_input) call only exists in the else branch. Any caller combining stderr_logger with input_text will silently lose the stdin data. While the vulnerability scanner doesn't currently use input_text, this is a general-purpose utility function shared across services.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

execution_status = ExecutionStatus.FAILURE
error_code = ErrorCode.UNKNOWN.value
error_message = repr(e)
output = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled exception in stale scan cleanup bypasses retry

Medium Severity

_mark_stale_scans_failed is called inside the except block without its own error handling. If it throws (e.g., DB connection failure or missing env var via os.environ[]), the exception propagates out of the except block entirely, bypassing the OOM/timeout retry logic and preventing the service execution record from being saved at the end of run().

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant