RAKE/YAKE keyword extraction for concept pipeline

## Summary

Replace the basic word-frequency concept extraction with RAKE (Rapid Automatic Keyword Extraction) or YAKE (Yet Another Keyword Extractor) implemented in pure Go. This improves concept quality for encoding, pattern detection, and retrieval without requiring an LLM.

## Motivation

Current concept extraction (`embedding.ExtractTopConcepts`) uses a fixed vocabulary with word counting:
- Only recognizes ~130 predefined terms
- Unknown words are ignored (or hashed for embeddings)
- No phrase detection ("spread activation" extracted as two separate words)
- No statistical weighting (term frequency, position, co-occurrence)

RAKE/YAKE would provide:
- **Multi-word phrases**: "spread activation", "bag of words", "air gapped"
- **Statistical ranking**: Terms weighted by frequency, position, and co-occurrence
- **Domain-adaptive**: No fixed vocabulary needed — learns from content
- **Pure Go**: No external dependencies, microsecond latency

## Implementation Plan

### Option A: RAKE (simpler)
1. Split text on stop words → candidate phrases
2. Score each phrase by word frequency, word degree, and word score
3. Return top-N phrases ranked by score
4. ~150 lines of Go + stop word list

### Option B: YAKE (better quality)
1. Statistical features: term frequency, position, sentence context, co-occurrence
2. No training needed — unsupervised
3. Better at handling technical text
4. ~300 lines of Go

### Integration
- New file: `internal/embedding/keywords.go`
- `ExtractKeywords(text string, n int) []string` — returns ranked keyword phrases
- Update `GenerateEncodingResponse()` to use RAKE/YAKE instead of vocabulary counting
- Keep `ExtractTopConcepts()` as fallback for backward compat
- Concept vocabulary still used for synonym grouping, not as the extraction source

### Config
```yaml
encoding:
  keyword_extractor: "rake"  # "vocabulary" (current), "rake", "yake"
```

## Acceptance Criteria

- [ ] Extracts multi-word phrases (not just single tokens)
- [ ] Pure Go, no external dependencies
- [ ] <1ms per extraction on typical memory content
- [ ] Improves pattern detection quality (more specific concepts = fewer false positives)
- [ ] `go test ./internal/embedding/...` passes
- [ ] Backward compatible — `vocabulary` mode still works

## References

- RAKE paper: Rose et al., "Automatic Keyword Extraction from Individual Documents"
- YAKE paper: Campos et al., "YAKE! Keyword Extraction from Single Documents"
- Parent: #369

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAKE/YAKE keyword extraction for concept pipeline #372

Summary

Motivation

Implementation Plan

Option A: RAKE (simpler)

Option B: YAKE (better quality)

Integration

Config

Acceptance Criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RAKE/YAKE keyword extraction for concept pipeline #372

Description

Summary

Motivation

Implementation Plan

Option A: RAKE (simpler)

Option B: YAKE (better quality)

Integration

Config

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions