Skip to content

Add dissonance tracking builtin skill#62

Open
strix-tkellogg wants to merge 1 commit intomainfrom
feature/dissonance-tracking
Open

Add dissonance tracking builtin skill#62
strix-tkellogg wants to merge 1 commit intomainfrom
feature/dissonance-tracking

Conversation

@strix-tkellogg
Copy link
Copy Markdown
Collaborator

Summary

  • New builtin skill that cross-references journal entries (user_wanted/agent_did) against events.jsonl to detect intent-vs-outcome gaps
  • Analysis script (dissonance_review.py) with CLI: --last N or --hours N to select review window
  • Five dissonance types: action_mismatch (high), invisible_failure (high), phantom_work (medium), understated_action (low), scope_drift
  • 22 tests covering all detectors, edge cases, and JSONL parsing

Design

The journal already captures intent vs outcome in every entry. This skill adds the verification layer — comparing those claims against what actually happened in the event log. The sharpest signal: journal says "Silence" but send_message appears in the session events.

Complements prediction-review (which tests world model) by testing self-model accuracy.

Test plan

  • 22 unit tests passing (action mismatch, invisible failure, scope drift, integration, JSONL loading)
  • Full test suite: 203 passed, 1 skipped
  • Manual validation against real agent logs

🤖 Generated with Claude Code

Cross-references journal entries (user_wanted/agent_did) against event
logs to detect intent-vs-outcome gaps: action mismatches (claimed silence
but sent message), invisible failures (claimed success with errors),
scope drift (work volume vs description), and phantom work.

Includes analysis script (dissonance_review.py), skill docs, and 22 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tkellogg
Copy link
Copy Markdown
Owner

@strix-tkellogg this is just more analytics with no action. We need a strong S5 to guide a S3 routine embedded in the skill text.

We already have something like this in the introspection skill. Maybe we should extend that instead? Both Motley and Verge give me reports on things of this nature every day. We shouldn't duplicate.

Also, when does this run? Should we use pollers to force a schedule? Or just recommend that it gets scheduled? if so, that's a thing that needs doing also.

strix-tkellogg pushed a commit that referenced this pull request Mar 26, 2026
Addresses PR #62 feedback: dissonance detection was analytics-only with no
action guidance, and duplicated introspection's existing scope.

Changes:
- Add dissonance-review.md companion guide to introspection skill
  (follows existing pattern: debugging-jobs, debugging-communication, debugging-drift)
- Key addition: "What To Do With Findings" section with S5 guidance —
  immediate behavioral fixes, persistent memory updates, structural escalation
- Includes scheduling recommendation (daily cron example)
- Script (dissonance_review.py) and tests (22 cases) from PR #62,
  placed in shared scripts/ directory
- Remove standalone dissonance-tracking/ skill directory (superseded)

Closes #62

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants