Add `graph suggest-collection` command to optimize collection organization

# Add `graph suggest-collection` command to optimize collection organization

## Summary

Add a new subcommand `fromager graph suggest-collection` that analyzes dependency overlap between onboarding packages and existing collections to recommend optimal collection assignments. This helps maintainers decide which collection each package should join based on minimizing the number of new packages that would need to be added.

## Motivation

When onboarding new packages to fromager builds, maintainers must decide which existing collection each package should join. This decision should be based on dependency overlap - packages that share many dependencies with an existing collection are good candidates for that collection.

Currently, this analysis is done manually, which is:
- **Time-consuming**: With 100+ onboarding packages and their transitive dependencies, manual analysis is slow
- **Error-prone**: It's easy to miss shared dependencies or assign packages suboptimally
- **Opaque**: No visibility into how well different collections would fit

The result is suboptimal collection organization with duplicated packages across collections, leading to inefficient builds.

## Proposed Solution

Create a new command that automates dependency analysis:

```bash
fromager graph suggest-collection ONBOARDING-GRAPH COLLECTION-GRAPHS...
```

For each top-level package in the onboarding graph:
1. Calculate its full dependency closure (all transitive dependencies)
2. Compare this closure against each existing collection
3. Rank collections by "fit" - how many new packages would need to be added
4. Display recommendations with statistics

## Command Interface

```bash
fromager graph suggest-collection [OPTIONS] ONBOARDING-GRAPH COLLECTION-GRAPHS...
```

**Arguments:**
- `ONBOARDING-GRAPH`: Graph file (graph.json) containing packages to assign
- `COLLECTION-GRAPHS`: One or more graph files for existing collections

**Options:**
- `--format [table|json]`: Output format (default: table)
  - `table`: Human-readable rich table showing best-fit collection for each package
  - `json`: Machine-readable JSON with all collections ranked for each package
- `--help`: Show help message

**Example:**
```bash
fromager graph suggest-collection \
  onboarding-graph.json \
  notebook-graph.json \
  rhai-innovation-graph.json
```

## Functional Requirements

### Core Functionality

1. **Load graph files**: Parse multiple graph.json files using `DependencyGraph.from_file()`

2. **Identify top-level packages**: Extract packages marked with `req_type: 'toplevel'` from the ROOT node in the onboarding graph

3. **Calculate dependency closures**: For each top-level package, traverse the full dependency tree to get all transitive dependencies (both install and build dependencies)

4. **Compare by package name only**: When comparing dependencies, match packages by canonical name ignoring version numbers (e.g., `numpy==1.26.0` and `numpy==1.25.0` are considered the same package)

5. **Rank collections by fit**: For each top-level package and collection pair:
   - Calculate set difference: packages in dependency closure NOT in collection
   - Calculate set intersection: packages in dependency closure AND in collection
   - Primary sort: fewest new packages (ascending)
   - Secondary sort: highest coverage percentage (descending)

6. **Display results**: Show each package with its best-fit collection and statistics

### Output Requirements

**Table format** (default `--format table`):
- One row per onboarding package
- Columns: Package name, Version, Total dependencies, Best fit collection, New packages, Existing packages, Coverage %
- Sorted alphabetically by package name
- Clear, scannable output for human review

**JSON format** (`--format json`):
- Array of objects, one per onboarding package
- Each object includes all collections ranked by fit
- Valid, parseable JSON for automation/scripting
- Structure enables downstream processing

## Algorithm

For each top-level package in the onboarding graph:

1. **Extract dependency closure**:
   ```python
   closure = set()  # Set of canonical package names
   visited = set()  # Set of node keys to prevent cycles
   stack = [top_level_node]

   while stack:
       node = stack.pop()
       if node.key in visited:
           continue
       visited.add(node.key)
       closure.add(node.canonicalized_name)

       # Add all children (both install and build deps)
       for edge in node.children:
           stack.append(edge.destination_node)
   ```

2. **Calculate fit with each collection**:
   ```python
   for collection in collections:
       collection_packages = {node.canonicalized_name for node in collection.get_all_nodes()}
       new_packages = closure - collection_packages
       existing_packages = closure & collection_packages

       fit_score = {
           "new_packages": len(new_packages),
           "existing_packages": len(existing_packages),
           "coverage_pct": len(existing_packages) / len(closure) * 100
       }
   ```

3. **Rank collections**:
   - Sort by `new_packages` (ascending) - fewer is better
   - Then by `coverage_pct` (descending) - higher is better

## Example Output

### Table Format (Default)

```
Collection Suggestions for Onboarding Packages

┃ Package             ┃ Version ┃ Total Deps ┃ Best Fit        ┃ New Pkgs ┃ Existing ┃ Coverage ┃
┃━━━━━━━━━━━━━━━━━━━━━┃━━━━━━━━━┃━━━━━━━━━━━━┃━━━━━━━━━━━━━━━━━┃━━━━━━━━━━┃━━━━━━━━━━┃━━━━━━━━━━┃
┃ fastapi             ┃ 0.135.1 ┃ 45         ┃ notebook        ┃ 3        ┃ 42       ┃ 93.3%    ┃
┃ instructorembedding ┃ 1.0.1   ┃ 123        ┃ notebook        ┃ 8        ┃ 115      ┃ 93.5%    ┃
┃ torch               ┃ 2.9.1   ┃ 89         ┃ rhai-innovation ┃ 12       ┃ 77       ┃ 86.5%    ┃
```

**Interpretation**:
- `fastapi` should join the `notebook` collection - only 3 new packages needed (93.3% overlap)
- `torch` fits better in `rhai-innovation` despite needing 12 new packages (still 86.5% overlap)

### JSON Format (`--format json`)

```json
[
  {
    "package": "fastapi",
    "version": "0.135.1",
    "total_dependencies": 45,
    "collections": [
      {
        "collection": "notebook",
        "new_packages": 3,
        "existing_packages": 42,
        "total_dependencies": 45,
        "coverage_pct": 93.3
      },
      {
        "collection": "rhai-innovation",
        "new_packages": 15,
        "existing_packages": 30,
        "total_dependencies": 45,
        "coverage_pct": 66.7
      }
    ]
  },
  {
    "package": "torch",
    "version": "2.9.1",
    "total_dependencies": 89,
    "collections": [
      {
        "collection": "rhai-innovation",
        "new_packages": 12,
        "existing_packages": 77,
        "total_dependencies": 89,
        "coverage_pct": 86.5
      },
      {
        "collection": "notebook",
        "new_packages": 34,
        "existing_packages": 55,
        "total_dependencies": 89,
        "coverage_pct": 61.8
      }
    ]
  }
]
```

## Implementation Guidance

### File Location
Add to `/src/fromager/commands/graph.py` as a new subcommand:

```python
@graph.command()
@click.option("--format", "output_format", type=click.Choice(["table", "json"]), default="table")
@click.argument("onboarding-graph", type=str)
@click.argument("collection-graphs", type=str, nargs=-1, required=True)
@click.pass_obj
def suggest_collection(
    wkctx: context.WorkContext,
    onboarding_graph: str,
    collection_graphs: tuple[str, ...],
    output_format: str,
) -> None:
    """Suggest which collection each onboarding package should join."""
```

### Utilities to Reuse

From existing codebase:
- `DependencyGraph.from_file(path)` - Load graph files
- `graph.get_root_node()` - Get ROOT node to find top-level packages
- `graph.get_all_nodes()` - Get all nodes for building package sets
- `node.canonicalized_name` - Get normalized package name (ignore version)
- `node.children` - Traverse dependency edges
- `canonicalize_name()` from `packaging.utils` - Normalize names
- `RequirementType.TOP_LEVEL` - Filter for top-level packages

### Reference Implementations

For patterns and style:
- **Table output**: See `src/fromager/commands/stats.py` for rich table construction
- **JSON output**: See `src/fromager/commands/list_overrides.py` for `--format` option pattern
- **Graph traversal**: See `src/fromager/commands/graph.py` for existing graph walking patterns

### Helper Functions Needed

```python
def get_dependency_closure(graph: DependencyGraph, node: DependencyNode) -> set[NormalizedName]:
    """Get all transitive dependencies of a node by canonicalized name."""

def get_package_names(graph: DependencyGraph) -> set[NormalizedName]:
    """Extract all unique package names from a graph."""

def extract_collection_name(graph_path: str) -> str:
    """Parse collection name from graph file path."""

def output_table(results: list[dict]) -> None:
    """Display results as a rich table."""

def output_json(results: list[dict]) -> None:
    """Display results as JSON."""
```

## Testing Requirements

### Unit Tests

Add to `/tests/test_graph.py`:

```python
def test_get_dependency_closure() -> None:
    """Test extracting full dependency closure by name."""

def test_get_package_names() -> None:
    """Test extracting unique package names from graph."""

def test_extract_collection_name() -> None:
    """Test parsing collection names from file paths."""

def test_suggest_collection_basic(tmp_path: pathlib.Path) -> None:
    """Test basic collection suggestion logic."""
```

### Integration Tests

```python
def test_suggest_collection_table_output(tmp_path: pathlib.Path) -> None:
    """Test command with table output format."""

def test_suggest_collection_json_output(tmp_path: pathlib.Path) -> None:
    """Test command with JSON output format."""

def test_suggest_collection_multiple_collections(tmp_path: pathlib.Path) -> None:
    """Test with multiple collection graphs."""
```

### Test Data

Use real graph files for integration testing or create minimal test graphs with known dependency structures.

## Edge Cases

The implementation should handle:

1. **Empty onboarding graph** (no top-level packages):
   - Output message: "No top-level packages found in onboarding graph"
   - Exit gracefully

2. **Package with no dependencies**:
   - Dependency closure contains only the package itself
   - All collections have same fit (0 new packages)

3. **Multiple versions of same package**:
   - Already handled by using `canonicalized_name` instead of versioned `key`
   - Version differences should not affect fit calculation

4. **Circular dependencies**:
   - Use `visited` set in traversal to prevent infinite loops
   - Each node visited only once

5. **Tied fit scores**:
   - Multiple collections with same `new_packages` count
   - Use `coverage_pct` as tiebreaker (higher is better)

6. **Collection name extraction**:
   - Handle various file naming patterns: `3.4.2293+notebook-cuda12.9-ubi9-aarch64-graph.json`
   - Fallback to filename without extensions if pattern doesn't match

7. **Large graphs**:
   - Algorithm is O(n*m) where n=top-level packages, m=collections
   - Should be acceptable for typical sizes (100s of packages, handful of collections)

## Success Criteria

- [ ] Command loads multiple graph files correctly
- [ ] Identifies all top-level packages from onboarding graph
- [ ] Calculates complete dependency closures
- [ ] Compares packages by canonical name (ignoring versions)
- [ ] Ranks collections correctly (fewest new packages first)
- [ ] Table output is clear and easy to read
- [ ] JSON output is valid and machine-parseable
- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] Edge cases handled gracefully
- [ ] Code follows fromager standards (type hints, docstrings, formatting)
- [ ] Documentation updated (if applicable)

## Related Work

Existing graph commands that may provide useful patterns:
- `graph to-constraints` - Loads graphs, processes nodes
- `graph explain-duplicates` - Analyzes multiple package versions
- `graph why` - Traverses dependency chains
- `stats` - Displays statistics with rich tables


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `graph suggest-collection` command to optimize collection organization #971

Add `graph suggest-collection` command to optimize collection organization

Summary

Motivation

Proposed Solution

Command Interface

Functional Requirements

Core Functionality

Output Requirements

Algorithm

Example Output

Table Format (Default)

JSON Format (`--format json`)

Implementation Guidance

File Location

Utilities to Reuse

Reference Implementations

Helper Functions Needed

Testing Requirements

Unit Tests

Integration Tests

Test Data

Edge Cases

Success Criteria

Related Work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add graph suggest-collection command to optimize collection organization #971

Description

Add graph suggest-collection command to optimize collection organization

Summary

Motivation

Proposed Solution

Command Interface

Functional Requirements

Core Functionality

Output Requirements

Algorithm

Example Output

Table Format (Default)

JSON Format (--format json)

Implementation Guidance

File Location

Utilities to Reuse

Reference Implementations

Helper Functions Needed

Testing Requirements

Unit Tests

Integration Tests

Test Data

Edge Cases

Success Criteria

Related Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add `graph suggest-collection` command to optimize collection organization #971

Add `graph suggest-collection` command to optimize collection organization

JSON Format (`--format json`)