Conversation
When a user passes --local, they expect no data leaves their machine. Previously the embedding provider was configured independently via config.yaml, so a user could have mode: cloud while running --local, silently sending profile text to a cloud API. Now: --local sets EMBEDDING_MODE=local env var before any embedding manager is created, forcing local-only embeddings. A startup preflight check fails fast if sentence-transformers isn't installed. A cache reset helper ensures no stale managers are reused. Adds 6 new tests covering cache reset, preflight check, and the env-var-overrides-config integration path.
When deterministic QC flags zero_entities, high_drop_rate, or many_duplicates, the extraction pipeline now retries once with a repair hint appended to the system prompt. The better result (by output count, then fewer severe flags) is kept. Retry metadata (attempted, trigger flags, output count, whether v2 was used) is recorded in PhaseOutcome.meta for observability. Changes: - article_processor.py: retry decision + loop + _run_extraction helper - extractors.py: accept optional repair_hint parameter - 14 new tests covering retry triggers, hint passing, and selection
MatchCheckResult now includes confidence: float (0-1, default 0.5) so the merge pipeline can detect when the match checker is uncertain. This is the prerequisite for gray-band dispute routing (Item 14). Changes: - match_checker.py: add confidence field with backward-compatible default, update both cloud/local prompts to request confidence score - constants.py: add MERGE_GRAY_BAND_DELTA and MERGE_UNCERTAIN_CONFIDENCE_CUTOFF - Smoke tests: explicit confidence in stubs, schema validation tests
When embedding similarity falls in the "gray band" (within ±0.05 of the threshold) AND the match checker's confidence is below 0.7, a second-stage dispute agent now makes a more deliberate merge/skip/defer decision. New module: src/engine/merge_dispute_agent.py - MergeDisputeAction enum (merge/skip/defer) - MergeDisputeDecision Pydantic model with confidence validation - run_merge_dispute_agent() with cloud/local LLM parity - Optional JSONL review queue for deferred cases Integration in mergers.py: - Gray-band routing after match check, before merge/skip decision - Dispute agent can override match checker in either direction - Defer action treated as skip (conservative safety default) 15 new tests covering schema, routing activation/non-activation, override behavior in both directions, and review queue persistence.
New grounding system verifies that profile claims are supported by their cited source articles. Uses existing CITATION_RE regex for deterministic claim extraction, then LLM calls to verify each claim against the source text. Models added to quality_controls.py: - SupportLevel enum (supported/partial/not_supported/unclear/missing_source) - ClaimVerification for individual claim results - GroundingReport with scoring, flags, and per-claim details Key features: - Claims grouped by article_id for efficient LLM batching - SHA-256 hash-based skip for unchanged profiles across runs - Batch post-processing integration in process_and_extract.py - Never raises — errors become flags in the report - Cloud/local model parity matching existing LLM patterns 20 new tests covering claim extraction, report schema, missing source handling, model routing, scoring thresholds, error fallbacks, and batch post-processing (entity mutation, hash skip, no-citation skip).
The local format.sh was running an older ruff (0.6.9) while CI uses the lock file version (0.9.6). Reformatted to match CI expectations.
Document that `uv run ruff` must be used before pushing to match CI's lock file ruff version, avoiding format mismatches between local and CI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the 4 remaining architecture review items from
design/remaining-architecture-phases.md(phases 3 and 5), adding verification and safety layers across the entity processing pipeline.--localCLI flag is active, forceEMBEDDING_MODE=localso no data leaves the machine, regardless of config.yaml settings. Fail loudly ifsentence-transformersis missing.zero_entities,high_drop_rate, ormany_duplicates. Keeps the better result (v1 vs v2) based on output count and severity.confidencefield toMatchCheckResult, then route ambiguous cases (gray-band similarity + low confidence) to a second-stage LLM dispute agent that can override the match checker's decision. Optional JSONL review queue for deferred cases.Changes
e6d500e--localflagb0ea82d7555358MatchCheckResultea4dd86b24e083ffb7c8abb2adf916 files changed, 2082+ insertions. 57 new tests across 4 new test files (153 total, all passing).
Key architectural decisions
EMBEDDING_MODE=localearly in CLI entry point, leveraging existing override path inEmbeddingManagerTest plan
pytest tests/ -v)uv run ruff check .+uv run ruff format --check .)