Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 45 additions & 4 deletions features/feature-spec-discovery.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Feature Spec: Discovery language registry and file indexing
# Feature Spec: Discovery pipeline foundations

## Purpose
Add shared discovery infrastructure for Issue #125: centralized parser abstraction and one-pass filesystem indexing.
Add shared discovery infrastructure for Issues #125 and #126: centralized parser abstraction, one-pass filesystem indexing, framework/config detection, and pipeline orchestration.

## User Stories

Expand Down Expand Up @@ -33,6 +33,35 @@ Add shared discovery infrastructure for Issue #125: centralized parser abstracti
**When** calling `detect_project_languages(index)`
**Then** it returns detected languages above the ratio threshold, computed against total indexed files.

### Story 4: Discovery configuration and framework detection
**Scenario:** As a pipeline maintainer, I need project-local discovery settings.
**Given** a repository root with `[tool.specleft.discovery]` in `pyproject.toml`
**When** I load `DiscoveryConfig.from_pyproject(root)`
**Then** it should return configured values with safe defaults for missing/invalid fields.

**Scenario:** As a discovery pipeline, I need framework signals shared across miners.
**Given** a repository with `pytest` configuration and matching test files
**When** I call `FrameworkDetector().detect(root, file_index)`
**Then** it should return `{SupportedLanguage.PYTHON: ["pytest"]}`.

### Story 5: Orchestrated miner execution
**Scenario:** As a discovery pipeline, I need deterministic and resilient miner execution.
**Given** a set of registered miners
**When** one miner raises an exception
**Then** the pipeline records the error in that miner result and continues running the remaining miners.

**Scenario:** As a pipeline consumer, I need correct filtering semantics.
**Given** detected project languages and miner language scopes
**When** a miner has no overlap with detected languages
**Then** it is skipped silently.
**And** language-agnostic miners (`languages = frozenset()`) always run.

### Story 6: Default pipeline wiring
**Scenario:** As a command entrypoint (`specleft discover` / `specleft start`), I need one constructor that wires everything.
**Given** a project root
**When** I call `build_default_pipeline(root).run()`
**Then** a `DiscoveryReport` is returned with run duration, detected languages, miner results, and total item counts.

## Acceptance Criteria
- Language abstraction returns `SupportedLanguage` members for `.py`, `.ts`, `.tsx`, `.js`, `.jsx`, `.mjs` and `None` otherwise.
- `LanguageRegistry().parse(path_to_py_file)` returns `(node, SupportedLanguage.PYTHON)` for valid Python input.
Expand All @@ -41,5 +70,17 @@ Add shared discovery infrastructure for Issue #125: centralized parser abstracti
- Grammar/parser handling is cached and does not recreate parser objects per call.
- `FileIndex` builds once per root and exposes query helpers used by miners.
- `detect_project_languages()` thresholds are applied against total indexed files, not only supported-language files.
- Tests cover registry parsing, caching behavior, index filtering, and language detection thresholding.
- Feature spec is updated to document the new discovery layer behavior for issue #125.
- `DiscoveryConfig.from_pyproject(root)` loads custom settings from `[tool.specleft.discovery]`.
- `DiscoveryConfig.from_pyproject(root)` returns defaults when the section is missing.
- `FrameworkDetector.detect()` returns `{PYTHON: ["pytest"]}` on the SpecLeft repo.
- `FrameworkDetector` is called once per pipeline run and the result is shared through one `MinerContext`.
- `MinerContext` is constructed once and reused for all miner calls in that run.
- Per-miner exceptions are captured into `MinerResult.error`/`error_kind` without stopping the run.
- `DiscoveryReport.total_items` excludes items from miners that errored.
- Miners with no language overlap are skipped; language-agnostic miners always run.
- `register()` raises `ValueError` for duplicate `miner_id` UUIDs.
- `MinerResult.miner_id` and `miner_name` in output are populated from the miner instance.
- `build_default_pipeline(root).run()` returns a valid `DiscoveryReport` even when all registered miners fail.
- Integration on the SpecLeft repository produces `report.total_items > 0`.
- Tests cover config parsing, framework detection, pipeline registration/filtering/error isolation, and default pipeline integration.
- Feature spec is updated to document the discovery layer behavior introduced in issues #125 and #126.
6 changes: 3 additions & 3 deletions src/specleft/commands/features.py
Original file line number Diff line number Diff line change
Expand Up @@ -815,7 +815,7 @@ def features_add(
_ensure_interactive(interactive)

if interactive:
title_input = click.prompt("Feature title", type=str).strip()
title_input = click.prompt("Feature title").strip()
default_feature_id = generate_feature_id(title_input)
feature_id_input = click.prompt(
"Feature ID",
Expand Down Expand Up @@ -1036,8 +1036,8 @@ def features_add_scenario(
_ensure_interactive(interactive)

if interactive:
feature_input = click.prompt("Feature ID", type=str).strip()
title_input = click.prompt("Scenario title", type=str).strip()
feature_input = click.prompt("Feature ID").strip()
title_input = click.prompt("Scenario title").strip()
default_scenario_id = generate_scenario_id(title_input)
scenario_id_input = click.prompt(
"Scenario ID",
Expand Down
8 changes: 8 additions & 0 deletions src/specleft/discovery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

from specleft.discovery.models import * # noqa: F401,F403

from specleft.discovery.config import DiscoveryConfig
from specleft.discovery.context import MinerContext
from specleft.discovery.file_index import DEFAULT_EXCLUDE_DIRS, FileIndex
from specleft.discovery.framework_detector import FrameworkDetector
from specleft.discovery.language_detect import detect_project_languages
from specleft.discovery.language_registry import SUPPORTED_EXTENSIONS, LanguageRegistry
from specleft.discovery.pipeline import (
BaseMiner,
DiscoveryPipeline,
build_default_pipeline,
)
121 changes: 121 additions & 0 deletions src/specleft/discovery/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 SpecLeft Contributors

"""User-facing configuration for discovery orchestration."""

from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Any

from specleft.discovery.file_index import DEFAULT_EXCLUDE_DIRS


@dataclass(frozen=True)
class DiscoveryConfig:
"""Configuration for discovery pipeline and miners."""

exclude_dirs: frozenset[str] = DEFAULT_EXCLUDE_DIRS
source_dirs: tuple[str, ...] = ("src", "lib", "app", "core")
max_git_commits: int = 200

@classmethod
def from_pyproject(cls, root: Path) -> DiscoveryConfig:
"""Load discovery config from ``[tool.specleft.discovery]`` if present."""
data = _load_pyproject(root)
section = _extract_discovery_section(data)
if not section:
return cls.default()

default = cls.default()

raw_exclude_dirs = section.get("exclude_dirs")
exclude_dirs = (
frozenset(value for value in raw_exclude_dirs if isinstance(value, str))
if isinstance(raw_exclude_dirs, list)
else default.exclude_dirs
)
if not exclude_dirs:
exclude_dirs = default.exclude_dirs

raw_source_dirs = section.get("source_dirs")
source_dirs = (
tuple(value for value in raw_source_dirs if isinstance(value, str))
if isinstance(raw_source_dirs, list)
else default.source_dirs
)
if not source_dirs:
source_dirs = default.source_dirs

raw_max_git_commits = section.get("max_git_commits")
if isinstance(raw_max_git_commits, int) and raw_max_git_commits > 0:
max_git_commits = raw_max_git_commits
else:
max_git_commits = default.max_git_commits

return cls(
exclude_dirs=exclude_dirs,
source_dirs=source_dirs,
max_git_commits=max_git_commits,
)

@classmethod
def default(cls) -> DiscoveryConfig:
"""Return config with all defaults."""
return cls()


def _extract_discovery_section(data: dict[str, Any]) -> dict[str, Any]:
tool = data.get("tool")
if not isinstance(tool, dict):
return {}

specleft = tool.get("specleft")
if not isinstance(specleft, dict):
return {}

discovery = specleft.get("discovery")
if not isinstance(discovery, dict):
return {}

return discovery


def _load_pyproject(root: Path) -> dict[str, Any]:
pyproject_path = root / "pyproject.toml"
if not pyproject_path.is_file():
return {}

try:
raw = pyproject_path.read_bytes()
except OSError:
return {}

toml_module = _resolve_toml_loader()
if toml_module is None:
return {}

try:
parsed = toml_module.loads(raw.decode("utf-8"))
except Exception:
return {}

if not isinstance(parsed, dict):
return {}

return parsed


def _resolve_toml_loader() -> Any | None:
try:
import tomllib

return tomllib
except ModuleNotFoundError:
try:
import tomli # type: ignore[import-not-found]

return tomli
except ModuleNotFoundError:
return None
25 changes: 25 additions & 0 deletions src/specleft/discovery/context.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 SpecLeft Contributors

"""Shared miner context built once per pipeline run."""

from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path

from specleft.discovery.config import DiscoveryConfig
from specleft.discovery.file_index import FileIndex
from specleft.discovery.language_registry import LanguageRegistry
from specleft.discovery.models import SupportedLanguage


@dataclass(frozen=True)
class MinerContext:
"""Immutable context passed to every miner."""

root: Path
registry: LanguageRegistry
file_index: FileIndex
frameworks: dict[SupportedLanguage, list[str]]
config: DiscoveryConfig
136 changes: 136 additions & 0 deletions src/specleft/discovery/framework_detector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 SpecLeft Contributors

"""Framework detection orchestrator and shared detection context."""

from __future__ import annotations

from dataclasses import dataclass
from functools import cached_property
from pathlib import Path
from typing import cast

from specleft.discovery.file_index import FileIndex
from specleft.discovery.frameworks import io
from specleft.discovery.frameworks.python.policies import PythonFrameworkPolicy
from specleft.discovery.frameworks.types import LanguagePolicy
from specleft.discovery.frameworks.typescript.policies import TypeScriptFrameworkPolicy
from specleft.discovery.models import SupportedLanguage


class FrameworkDetector:
"""Detect test frameworks by combining manifest and file-pattern signals."""

def __init__(self, policies: tuple[LanguagePolicy, ...] | None = None) -> None:
self._policies = policies if policies is not None else _default_policies()

def detect(
self,
root: Path,
file_index: FileIndex,
) -> dict[SupportedLanguage, list[str]]:
"""Detect framework names by language."""
ctx = DetectionContext(root=root, file_index=file_index)
detected: dict[SupportedLanguage, list[str]] = {}

for policy in self._policies:
frameworks = policy.detect(ctx)
if frameworks:
detected[policy.language] = frameworks

return detected


@dataclass(frozen=True)
class DetectionContext:
"""Cached evidence shared by all framework policies and rules."""

root: Path
file_index: FileIndex

@cached_property
def pyproject(self) -> dict[str, object]:
return io.load_pyproject(self.root)

@cached_property
def package_json(self) -> dict[str, object]:
return io.load_package_json(self.root)

@cached_property
def requirements_lines(self) -> tuple[str, ...]:
lines: list[str] = []
for requirements_file in sorted(self.root.glob("requirements*.txt")):
raw = io.read_text(requirements_file)
if raw is None:
continue
lines.extend(line.strip().lower() for line in raw.splitlines())
return tuple(lines)

@cached_property
def python_test_files(self) -> list[Path]:
return [
path
for path in self.file_index.files_matching("test_*.py")
if io.is_project_file(path)
]

@cached_property
def conftest_files(self) -> list[Path]:
return [
path
for path in self.file_index.files_matching("conftest.py")
if io.is_project_file(path)
]

@cached_property
def has_unittest_testcases(self) -> bool:
for python_file in self.python_test_files:
source = io.read_text(self.root / python_file)
if source is None:
continue
if io.contains_unittest_testcase(source):
return True

return False

@cached_property
def typescript_manifest_frameworks(self) -> set[str]:
return io.manifest_typescript_frameworks(self.package_json)

@cached_property
def jest_configs(self) -> list[Path]:
return self.file_index.files_matching(
"jest.config.js",
"jest.config.ts",
"jest.config.mjs",
"jest.config.cjs",
"jest.config.json",
)

@cached_property
def vite_configs(self) -> list[Path]:
return self.file_index.files_matching(
"vite.config.js",
"vite.config.ts",
"vite.config.mjs",
"vite.config.cjs",
)

@cached_property
def vitest_tests(self) -> list[Path]:
return self.file_index.files_matching(
"*.test.ts",
"*.test.tsx",
"*.test.js",
"*.test.jsx",
)


def _default_policies() -> tuple[LanguagePolicy, ...]:
return cast(
tuple[LanguagePolicy, ...],
(PythonFrameworkPolicy(), TypeScriptFrameworkPolicy()),
)


__all__ = ["DetectionContext", "FrameworkDetector"]
12 changes: 12 additions & 0 deletions src/specleft/discovery/frameworks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2026 SpecLeft Contributors

"""Framework detection building blocks."""

from specleft.discovery.frameworks.types import (
FrameworkRule,
FrameworkSignals,
LanguagePolicy,
)

__all__ = ["FrameworkSignals", "FrameworkRule", "LanguagePolicy"]
Loading