Context
After running the full scenario harness and analyzing results, these test gaps were identified as high-value additions for improving classifier quality.
Proposed additional test scenarios
1. Dedup sensitivity — near-duplicate variants
The current dedup test (`edge-duplicate-injection`) injects identical text. We need tests for:
- Rephrased duplicates: "Never commit secrets" vs "Do not commit secrets to the repository" — Jaccard similarity may fall below 0.8, causing false negatives
- Partial duplicates: a new session adds 3 rules, 2 of which already exist in ADF — only 1 should migrate
2. Multi-module section splitting — heading dominates all items
Current classifier: once a heading routes to a module, ALL items in that section go to the same module. Items with keywords for other modules are ignored.
Example failure:
```
Database
- D1 bound as `DB` in wrangler.toml → backend.adf (heading wins)
- Run migrations with `wrangler d1 migrate` → backend.adf (should be infra.adf!)
```
A test that verifies cross-keyword items within a section would expose this and track when it's fixed.
3. Security boundary routing — auth in backend vs security modules
Auth-related rules appear in two contexts:
- Implementation rules (how to write auth code): belong in `backend.adf`
- Security policy rules (what must be enforced): belong in `security.adf`
The current `## Auth` heading maps everything to `security.adf`. A test with mixed implementation + policy rules under one heading would expose the lack of sub-heading routing.
4. Trigger prefix collision — short triggers matching unrelated content
The prefix match fix (removing trailing `\b`) introduced a potential over-matching risk. Example:
- Trigger `auth` now matches "authority", "author", "authentic"
- Trigger `api` matches "apiary", "apiVersion"
A test with content containing "the author of this library" or "apiary endpoint" should verify these don't accidentally route to security/backend modules.
5. Large injection — 20+ items in one session
Current tests max at ~13 items per session. A stress test with 25+ items would:
- Test dedup performance (O(n²) Jaccard comparisons)
- Verify routing accuracy doesn't degrade at scale
- Surface any ADF write failures for large patch sets
6. Empty/minimal injection — just a heading, no items
Edge case: AI adds `## Auth\n\n` (heading with no content). Should produce 0 extractions cleanly without errors.
Implementation
Add these to `harness/corpus/edge-cases.ts` as additional `Scenario` objects. The trigger prefix collision test (#4) is particularly important to add before the prefix-match change ships in a release.
Context
After running the full scenario harness and analyzing results, these test gaps were identified as high-value additions for improving classifier quality.
Proposed additional test scenarios
1. Dedup sensitivity — near-duplicate variants
The current dedup test (`edge-duplicate-injection`) injects identical text. We need tests for:
2. Multi-module section splitting — heading dominates all items
Current classifier: once a heading routes to a module, ALL items in that section go to the same module. Items with keywords for other modules are ignored.
Example failure:
```
Database
```
A test that verifies cross-keyword items within a section would expose this and track when it's fixed.
3. Security boundary routing — auth in backend vs security modules
Auth-related rules appear in two contexts:
The current `## Auth` heading maps everything to `security.adf`. A test with mixed implementation + policy rules under one heading would expose the lack of sub-heading routing.
4. Trigger prefix collision — short triggers matching unrelated content
The prefix match fix (removing trailing `\b`) introduced a potential over-matching risk. Example:
A test with content containing "the author of this library" or "apiary endpoint" should verify these don't accidentally route to security/backend modules.
5. Large injection — 20+ items in one session
Current tests max at ~13 items per session. A stress test with 25+ items would:
6. Empty/minimal injection — just a heading, no items
Edge case: AI adds `## Auth\n\n` (heading with no content). Should produce 0 extractions cleanly without errors.
Implementation
Add these to `harness/corpus/edge-cases.ts` as additional `Scenario` objects. The trigger prefix collision test (#4) is particularly important to add before the prefix-match change ships in a release.