Skip to content

Discovery miner: git history (commit grouping) #132

@Dimwiddle

Description

@Dimwiddle

Summary

Parse recent git commit history to infer feature groupings from commit messages and changed file paths. Language-agnostic — uses subprocess + git log, not tree-sitter.

Depends on: #124, #126

New file

src/specleft/discovery/miners/shared/git_history.py

import uuid
from specleft.discovery.models import SupportedLanguage, MinerResult, DiscoveredItem, ItemKind, GitCommitMeta, MinerErrorKind
from specleft.discovery.context import MinerContext

class GitHistoryMiner:
    miner_id = uuid.UUID("f1c93075-4e3c-44b8-bef6-9c0bc25b6c42")
    name = "git_history"
    languages = frozenset()    # language-agnostic; always runs

    def mine(self, ctx: MinerContext) -> MinerResult: ...

Git log command

git -C {ctx.root} log --no-merges \
    --format="%H%n%s%n%b%n---END---" \
    --name-only -n {ctx.config.max_git_commits}

Note: MAX_COMMITS is now read from ctx.config.max_git_commits (default: 200, configurable via [tool.specleft.discovery].max_git_commits in pyproject.toml).

Parsing

  • Split output on ---END--- separator
  • Per commit: extract short hash (7 chars), subject, body, list of changed files
  • Skip commits whose subject matches conventional commit noise prefixes: chore:, ci:, build:, docs:, style:, test:
  • Produce one DiscoveredItem per remaining commit that has >=1 changed source file

Typed metadata

Each item's metadata dict must conform to GitCommitMeta:

GitCommitMeta(
    commit_hash       = "a7b21db",
    subject           = "feat: add login endpoint",
    body              = "Implements JWT-based authentication...",
    changed_files     = ["src/auth/login.py", "tests/test_login.py"],
    conventional_type = "feat",
    file_prefixes     = ["src/auth", "tests"],
)

name: commit subject line
file_path: None (git items span multiple files)
language: None (language-agnostic)
confidence: 0.5 (git history is a weak intent signal)

Note: languages = frozenset() means this miner always runs regardless of detected languages. The pipeline treats empty frozenset as "language-agnostic".

Error handling

If git is not on PATH or ctx.root is not a git repository:

MinerResult(
    miner_id=self.miner_id,
    miner_name=self.name,
    items=[],
    error="not a git repository",
    error_kind=MinerErrorKind.NOT_INSTALLED,
    duration_ms=0,
)

Do not raise.

Acceptance criteria

  • Running on the specleft repo returns total_items > 0
  • Merge commits excluded (--no-merges)
  • Uses ctx.config.max_git_commits — not a hardcoded constant
  • Each item's metadata validates against GitCommitMeta
  • conventional_type="feat" parsed from "feat: add login endpoint"
  • Commits with subject "chore: update lockfile" are skipped
  • All items have language=None
  • Non-git directory returns MinerResult with error + error_kind=NOT_INSTALLED, no exception
  • Tests in tests/discovery/miners/test_git_history.py using a tmp_path git repo fixture
  • Update scenarios and tests in features/feature-spec-discovery.md to cover the functionality introduced by this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    gitGit history relatedminerDiscovery miner implementationnew featureIssues or PRs for a new feature that doesn't currently exist

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions