SpecLeft · Dimwiddle · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/features/feature-spec-discovery.md b/features/feature-spec-discovery.md
@@ -1,7 +1,7 @@
-# Feature Spec: Discovery language registry and file indexing
+# Feature Spec: Discovery pipeline foundations
 
 ## Purpose
-Add shared discovery infrastructure for Issue #125: centralized parser abstraction and one-pass filesystem indexing.
+Add shared discovery infrastructure for Issues #125 and #126: centralized parser abstraction, one-pass filesystem indexing, framework/config detection, and pipeline orchestration.
 
 ## User Stories
 
@@ -33,6 +33,35 @@ Add shared discovery infrastructure for Issue #125: centralized parser abstracti
 **When** calling `detect_project_languages(index)`
 **Then** it returns detected languages above the ratio threshold, computed against total indexed files.
 
+### Story 4: Discovery configuration and framework detection
+**Scenario:** As a pipeline maintainer, I need project-local discovery settings.
+**Given** a repository root with `[tool.specleft.discovery]` in `pyproject.toml`
+**When** I load `DiscoveryConfig.from_pyproject(root)`
+**Then** it should return configured values with safe defaults for missing/invalid fields.
+
+**Scenario:** As a discovery pipeline, I need framework signals shared across miners.
+**Given** a repository with `pytest` configuration and matching test files
+**When** I call `FrameworkDetector().detect(root, file_index)`
+**Then** it should return `{SupportedLanguage.PYTHON: ["pytest"]}`.
+
+### Story 5: Orchestrated miner execution
+**Scenario:** As a discovery pipeline, I need deterministic and resilient miner execution.
+**Given** a set of registered miners
+**When** one miner raises an exception
+**Then** the pipeline records the error in that miner result and continues running the remaining miners.
+
+**Scenario:** As a pipeline consumer, I need correct filtering semantics.
+**Given** detected project languages and miner language scopes
+**When** a miner has no overlap with detected languages
+**Then** it is skipped silently.
+**And** language-agnostic miners (`languages = frozenset()`) always run.
+
+### Story 6: Default pipeline wiring
+**Scenario:** As a command entrypoint (`specleft discover` / `specleft start`), I need one constructor that wires everything.
+**Given** a project root
+**When** I call `build_default_pipeline(root).run()`
+**Then** a `DiscoveryReport` is returned with run duration, detected languages, miner results, and total item counts.
+
 ## Acceptance Criteria
 - Language abstraction returns `SupportedLanguage` members for `.py`, `.ts`, `.tsx`, `.js`, `.jsx`, `.mjs` and `None` otherwise.
 - `LanguageRegistry().parse(path_to_py_file)` returns `(node, SupportedLanguage.PYTHON)` for valid Python input.
@@ -41,5 +70,17 @@ Add shared discovery infrastructure for Issue #125: centralized parser abstracti
 - Grammar/parser handling is cached and does not recreate parser objects per call.
 - `FileIndex` builds once per root and exposes query helpers used by miners.
 - `detect_project_languages()` thresholds are applied against total indexed files, not only supported-language files.
-- Tests cover registry parsing, caching behavior, index filtering, and language detection thresholding.
-- Feature spec is updated to document the new discovery layer behavior for issue #125.
+- `DiscoveryConfig.from_pyproject(root)` loads custom settings from `[tool.specleft.discovery]`.
+- `DiscoveryConfig.from_pyproject(root)` returns defaults when the section is missing.
+- `FrameworkDetector.detect()` returns `{PYTHON: ["pytest"]}` on the SpecLeft repo.
+- `FrameworkDetector` is called once per pipeline run and the result is shared through one `MinerContext`.
+- `MinerContext` is constructed once and reused for all miner calls in that run.
+- Per-miner exceptions are captured into `MinerResult.error`/`error_kind` without stopping the run.
+- `DiscoveryReport.total_items` excludes items from miners that errored.
+- Miners with no language overlap are skipped; language-agnostic miners always run.
+- `register()` raises `ValueError` for duplicate `miner_id` UUIDs.
+- `MinerResult.miner_id` and `miner_name` in output are populated from the miner instance.
+- `build_default_pipeline(root).run()` returns a valid `DiscoveryReport` even when all registered miners fail.
+- Integration on the SpecLeft repository produces `report.total_items > 0`.
+- Tests cover config parsing, framework detection, pipeline registration/filtering/error isolation, and default pipeline integration.
+- Feature spec is updated to document the discovery layer behavior introduced in issues #125 and #126.
diff --git a/src/specleft/commands/features.py b/src/specleft/commands/features.py
@@ -815,7 +815,7 @@ def features_add(
     _ensure_interactive(interactive)
 
     if interactive:
-        title_input = click.prompt("Feature title", type=str).strip()
+        title_input = click.prompt("Feature title").strip()
         default_feature_id = generate_feature_id(title_input)
         feature_id_input = click.prompt(
             "Feature ID",
@@ -1036,8 +1036,8 @@ def features_add_scenario(
     _ensure_interactive(interactive)
 
     if interactive:
-        feature_input = click.prompt("Feature ID", type=str).strip()
-        title_input = click.prompt("Scenario title", type=str).strip()
+        feature_input = click.prompt("Feature ID").strip()
+        title_input = click.prompt("Scenario title").strip()
         default_scenario_id = generate_scenario_id(title_input)
         scenario_id_input = click.prompt(
             "Scenario ID",

diff --git a/src/specleft/discovery/__init__.py b/src/specleft/discovery/__init__.py
@@ -2,6 +2,14 @@
 
 from specleft.discovery.models import *  # noqa: F401,F403
 
+from specleft.discovery.config import DiscoveryConfig
+from specleft.discovery.context import MinerContext
 from specleft.discovery.file_index import DEFAULT_EXCLUDE_DIRS, FileIndex
+from specleft.discovery.framework_detector import FrameworkDetector
 from specleft.discovery.language_detect import detect_project_languages
 from specleft.discovery.language_registry import SUPPORTED_EXTENSIONS, LanguageRegistry
+from specleft.discovery.pipeline import (
+    BaseMiner,
+    DiscoveryPipeline,
+    build_default_pipeline,
+)
diff --git a/src/specleft/discovery/config.py b/src/specleft/discovery/config.py
@@ -0,0 +1,121 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 SpecLeft Contributors
+
+"""User-facing configuration for discovery orchestration."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+
+from specleft.discovery.file_index import DEFAULT_EXCLUDE_DIRS
+
+
+@dataclass(frozen=True)
+class DiscoveryConfig:
+    """Configuration for discovery pipeline and miners."""
+
+    exclude_dirs: frozenset[str] = DEFAULT_EXCLUDE_DIRS
+    source_dirs: tuple[str, ...] = ("src", "lib", "app", "core")
+    max_git_commits: int = 200
+
+    @classmethod
+    def from_pyproject(cls, root: Path) -> DiscoveryConfig:
+        """Load discovery config from ``[tool.specleft.discovery]`` if present."""
+        data = _load_pyproject(root)
+        section = _extract_discovery_section(data)
+        if not section:
+            return cls.default()
+
+        default = cls.default()
+
+        raw_exclude_dirs = section.get("exclude_dirs")
+        exclude_dirs = (
+            frozenset(value for value in raw_exclude_dirs if isinstance(value, str))
+            if isinstance(raw_exclude_dirs, list)
+            else default.exclude_dirs
+        )
+        if not exclude_dirs:
+            exclude_dirs = default.exclude_dirs
+
+        raw_source_dirs = section.get("source_dirs")
+        source_dirs = (
+            tuple(value for value in raw_source_dirs if isinstance(value, str))
+            if isinstance(raw_source_dirs, list)
+            else default.source_dirs
+        )
+        if not source_dirs:
+            source_dirs = default.source_dirs
+
+        raw_max_git_commits = section.get("max_git_commits")
+        if isinstance(raw_max_git_commits, int) and raw_max_git_commits > 0:
+            max_git_commits = raw_max_git_commits
+        else:
+            max_git_commits = default.max_git_commits
+
+        return cls(
+            exclude_dirs=exclude_dirs,
+            source_dirs=source_dirs,
+            max_git_commits=max_git_commits,
+        )
+
+    @classmethod
+    def default(cls) -> DiscoveryConfig:
+        """Return config with all defaults."""
+        return cls()
+
+
+def _extract_discovery_section(data: dict[str, Any]) -> dict[str, Any]:
+    tool = data.get("tool")
+    if not isinstance(tool, dict):
+        return {}
+
+    specleft = tool.get("specleft")
+    if not isinstance(specleft, dict):
+        return {}
+
+    discovery = specleft.get("discovery")
+    if not isinstance(discovery, dict):
+        return {}
+
+    return discovery
+
+
+def _load_pyproject(root: Path) -> dict[str, Any]:
+    pyproject_path = root / "pyproject.toml"
+    if not pyproject_path.is_file():
+        return {}
+
+    try:
+        raw = pyproject_path.read_bytes()
+    except OSError:
+        return {}
+
+    toml_module = _resolve_toml_loader()
+    if toml_module is None:
+        return {}
+
+    try:
+        parsed = toml_module.loads(raw.decode("utf-8"))
+    except Exception:
+        return {}
+
+    if not isinstance(parsed, dict):
+        return {}
+
+    return parsed
+
+
+def _resolve_toml_loader() -> Any | None:
+    try:
+        import tomllib
+
+        return tomllib
+    except ModuleNotFoundError:
+        try:
+            import tomli  # type: ignore[import-not-found]
+
+            return tomli
+        except ModuleNotFoundError:
+            return None
diff --git a/src/specleft/discovery/context.py b/src/specleft/discovery/context.py
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 SpecLeft Contributors
+
+"""Shared miner context built once per pipeline run."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+
+from specleft.discovery.config import DiscoveryConfig
+from specleft.discovery.file_index import FileIndex
+from specleft.discovery.language_registry import LanguageRegistry
+from specleft.discovery.models import SupportedLanguage
+
+
+@dataclass(frozen=True)
+class MinerContext:
+    """Immutable context passed to every miner."""
+
+    root: Path
+    registry: LanguageRegistry
+    file_index: FileIndex
+    frameworks: dict[SupportedLanguage, list[str]]
+    config: DiscoveryConfig
diff --git a/src/specleft/discovery/framework_detector.py b/src/specleft/discovery/framework_detector.py
@@ -0,0 +1,136 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 SpecLeft Contributors
+
+"""Framework detection orchestrator and shared detection context."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from functools import cached_property
+from pathlib import Path
+from typing import cast
+
+from specleft.discovery.file_index import FileIndex
+from specleft.discovery.frameworks import io
+from specleft.discovery.frameworks.python.policies import PythonFrameworkPolicy
+from specleft.discovery.frameworks.types import LanguagePolicy
+from specleft.discovery.frameworks.typescript.policies import TypeScriptFrameworkPolicy
+from specleft.discovery.models import SupportedLanguage
+
+
+class FrameworkDetector:
+    """Detect test frameworks by combining manifest and file-pattern signals."""
+
+    def __init__(self, policies: tuple[LanguagePolicy, ...] | None = None) -> None:
+        self._policies = policies if policies is not None else _default_policies()
+
+    def detect(
+        self,
+        root: Path,
+        file_index: FileIndex,
+    ) -> dict[SupportedLanguage, list[str]]:
+        """Detect framework names by language."""
+        ctx = DetectionContext(root=root, file_index=file_index)
+        detected: dict[SupportedLanguage, list[str]] = {}
+
+        for policy in self._policies:
+            frameworks = policy.detect(ctx)
+            if frameworks:
+                detected[policy.language] = frameworks
+
+        return detected
+
+
+@dataclass(frozen=True)
+class DetectionContext:
+    """Cached evidence shared by all framework policies and rules."""
+
+    root: Path
+    file_index: FileIndex
+
+    @cached_property
+    def pyproject(self) -> dict[str, object]:
+        return io.load_pyproject(self.root)
+
+    @cached_property
+    def package_json(self) -> dict[str, object]:
+        return io.load_package_json(self.root)
+
+    @cached_property
+    def requirements_lines(self) -> tuple[str, ...]:
+        lines: list[str] = []
+        for requirements_file in sorted(self.root.glob("requirements*.txt")):
+            raw = io.read_text(requirements_file)
+            if raw is None:
+                continue
+            lines.extend(line.strip().lower() for line in raw.splitlines())
+        return tuple(lines)
+
+    @cached_property
+    def python_test_files(self) -> list[Path]:
+        return [
+            path
+            for path in self.file_index.files_matching("test_*.py")
+            if io.is_project_file(path)
+        ]
+
+    @cached_property
+    def conftest_files(self) -> list[Path]:
+        return [
+            path
+            for path in self.file_index.files_matching("conftest.py")
+            if io.is_project_file(path)
+        ]
+
+    @cached_property
+    def has_unittest_testcases(self) -> bool:
+        for python_file in self.python_test_files:
+            source = io.read_text(self.root / python_file)
+            if source is None:
+                continue
+            if io.contains_unittest_testcase(source):
+                return True
+
+        return False
+
+    @cached_property
+    def typescript_manifest_frameworks(self) -> set[str]:
+        return io.manifest_typescript_frameworks(self.package_json)
+
+    @cached_property
+    def jest_configs(self) -> list[Path]:
+        return self.file_index.files_matching(
+            "jest.config.js",
+            "jest.config.ts",
+            "jest.config.mjs",
+            "jest.config.cjs",
+            "jest.config.json",
+        )
+
+    @cached_property
+    def vite_configs(self) -> list[Path]:
+        return self.file_index.files_matching(
+            "vite.config.js",
+            "vite.config.ts",
+            "vite.config.mjs",
+            "vite.config.cjs",
+        )
+
+    @cached_property
+    def vitest_tests(self) -> list[Path]:
+        return self.file_index.files_matching(
+            "*.test.ts",
+            "*.test.tsx",
+            "*.test.js",
+            "*.test.jsx",
+        )
+
+
+def _default_policies() -> tuple[LanguagePolicy, ...]:
+    return cast(
+        tuple[LanguagePolicy, ...],
+        (PythonFrameworkPolicy(), TypeScriptFrameworkPolicy()),
+    )
+
+
+__all__ = ["DetectionContext", "FrameworkDetector"]
diff --git a/src/specleft/discovery/frameworks/__init__.py b/src/specleft/discovery/frameworks/__init__.py
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: Apache-2.0
+# Copyright (c) 2026 SpecLeft Contributors
+
+"""Framework detection building blocks."""
+
+from specleft.discovery.frameworks.types import (
+    FrameworkRule,
+    FrameworkSignals,
+    LanguagePolicy,
+)
+
+__all__ = ["FrameworkSignals", "FrameworkRule", "LanguagePolicy"]