Skip to content

Discovery Phase 2: feature grouping algorithm #133

@Dimwiddle

Description

@Dimwiddle

Summary

Group a flat list of DiscoveredItems into DraftFeature clusters. This is the core intelligence of Phase 2. Uses file path proximity, naming conventions, and API path prefixes — no ML or LLM required.

Depends on: #124, and at least one miner (#127#132)

New file

src/specleft/discovery/grouping.py

from specleft.discovery.models import (
    DiscoveredItem, DraftFeature, DraftScenario, ItemKind,
    TestFunctionMeta, ApiRouteMeta, GitCommitMeta,
)

def group_items(items: list[DiscoveredItem]) -> list[DraftFeature]: ...

Grouping strategy (applied in priority order)

1. File-path grouping (primary)
Items whose file_path shares a common directory segment form an initial group. Group key = the most specific shared directory name.

tests/auth/test_login.py
tests/auth/test_logout.py   →  group key: "auth"

2. API path prefix grouping
Route items (kind=API_ROUTE) — use item.typed_meta() to get ApiRouteMeta and extract path. Group by first path segment.

GET /users/{id}
POST /users        →  group key: "users"
DELETE /users/{id}

3. Name-prefix grouping (fallback)
Items whose name shares a common prefix after stripping test_, test , it , etc.

test_payment_success
test_payment_declined   →  group key: "payment"

4. Git history cross-reference
Git items (kind=GIT_COMMIT) — use item.typed_meta() to get GitCommitMeta and extract file_prefixes. Merge into the existing group whose file paths overlap most. Unmatched git items form their own group only if they have >=3 commits pointing to the same prefix.

Typed metadata access

The grouping algorithm should use item.typed_meta() for type-safe access to metadata fields. This avoids dict["key"] lookups and ensures compile-time safety:

# Instead of:
path = item.metadata["path"]           # KeyError risk
prefixes = item.metadata["file_prefixes"]  # KeyError risk

# Use:
meta = item.typed_meta()
if isinstance(meta, ApiRouteMeta):
    path = meta.path                   # type-safe
elif isinstance(meta, GitCommitMeta):
    prefixes = meta.file_prefixes      # type-safe

Group naming

Slugify the group key. Expand common abbreviations before slugifying:

  • authauthentication
  • mgmtmanagement
  • cfg / configconfiguration
  • notifnotifications
  • msgmessaging

DraftFeature.name = title-cased expanded label (e.g. "User Authentication").
DraftFeature.feature_id = slugified (e.g. "user-authentication").

Confidence scoring

Signal Bonus
Items from >=2 different miners +0.2
Group has >=1 docstring item +0.1
Group has >=1 git item corroborating +0.1
Base score 0.5
Maximum 1.0

Acceptance criteria

  • 10 items from tests/auth/ all land in a single group
  • API routes GET /payments/* and POST /payments form a separate group from users
  • No items dropped — every DiscoveredItem appears in exactly one DraftFeature
  • Git items distributed to nearest matching group via GitCommitMeta.file_prefixes, not siloed
  • Grouping uses item.typed_meta() for type-safe metadata access (no raw metadata["key"])
  • Minimum group size: 1 item (single-item groups are valid)
  • Tests in tests/discovery/test_grouping.py using synthetic DiscoveredItem fixtures
  • Update scenarios and tests in features/feature-spec-discovery.md to cover the functionality introduced by this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    algorithmAlgorithmic and grouping logicnew featureIssues or PRs for a new feature that doesn't currently existphase-2Phase 2 spec generation

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions