Skip to content

T-017: SmartCRDT multi-agent collaboration layer#20

Open
SuperInstance wants to merge 1 commit intomainfrom
superz/T-017
Open

T-017: SmartCRDT multi-agent collaboration layer#20
SuperInstance wants to merge 1 commit intomainfrom
superz/T-017

Conversation

@SuperInstance
Copy link
Copy Markdown
Owner

@SuperInstance SuperInstance commented Apr 13, 2026

Summary

Building a CRDT-based multi-agent collaboration layer for fleet coordination.

What was built

CRDT Primitives (pure TypeScript, no native dependencies)

  • GCounter — Grow-only counter for monotonic metrics (tasks completed, messages sent)
  • PNCounter — Positive-negative counter for adjustable metrics (queue depth, error counts)
  • LWWRegister — Last-writer-wins register with timestamp + node-ID tiebreaking
  • ORSet — Observed-remove set with concurrent add/remove semantics (add wins)
  • LWWMap — Map of string keys to LWW-Registers for structured config

Collaboration Components

  • TaskBoard — Agents create, claim, start, complete, fail, cancel tasks without conflicts. Supports dependency ordering, priority filtering, available-task queries.
  • KnowledgeBase — Agents contribute findings (title, content, category, tags, confidence). Auto-merges via LWW per entry. Supports search, category/tag filtering, confidence thresholds.
  • FleetState — Tracks agent status, capabilities, current task, last-seen times. Supports stale agent eviction, available-agent search by capability.
  • MetricsAggregator — GCounter for monotonic metrics, PNCounter for adjustable metrics. Per-agent breakdowns.
  • ConfigRegistry — Fleet-wide configuration via LWW-Map. Typed getters (string, number, boolean, with defaults).
  • MembershipRegistry — OR-Set based fleet membership. Supports roles, capabilities, concurrent join/leave.
  • FleetCollabStore — Unified store combining all components with full export/import/merge.

HTTP API (20+ endpoints)

  • Task CRUD: /task/create, /task/claim, /task/complete, /task/:id, /tasks
  • Knowledge: /knowledge/contribute, /knowledge/:id, /knowledge?q=...
  • Fleet: /heartbeat, /fleet, /fleet/summary
  • Metrics: /metrics/increment, /metrics
  • Config: /config/set, /config/:key, /config
  • Membership: /membership/join, /membership/leave, /membership
  • Sync: /merge, /export, /import

Tests: 146 tests, all passing

  • 34 CRDT primitive tests (GCounter, PNCounter, LWWRegister, ORSet, LWWMap)
  • 24 task board tests (full lifecycle, conflict prevention, events, merge)
  • 18 knowledge base tests (CRUD, search, filtering, merge, permissions)
  • 52 fleet component tests (fleet state, metrics, config, membership, store integration)
  • 18 HTTP API tests (all endpoints)

CRDT Types Implemented

Type Purpose
GCounter Monotonic metric aggregation
PNCounter Adjustable metric aggregation
LWWRegister Single-value configuration
ORSet Fleet membership tracking
LWWMap Structured configuration store
TaskBoard (LWW per task) Conflict-free task coordination
KnowledgeBase (LWW per entry) Auto-merging knowledge contributions
FleetState (LWW + ORSet + GCounter) Fleet status tracking

Staging: Open in Devin

- CRDT primitives: GCounter, PNCounter, LWWRegister, ORSet, LWWMap
- Task board: agents can create/claim/complete tasks without conflicts
- Knowledge base: agents contribute findings that auto-merge via LWW
- Fleet state: who's working on what, last seen times, capabilities
- Metrics aggregator: GCounter for monotonic, PNCounter for adjustable metrics
- Config registry: LWW-Map for fleet-wide configuration values
- Membership registry: OR-Set for fleet membership tracking
- Unified FleetCollabStore with full export/import/merge
- HTTP API with 20+ REST endpoints for agent interaction
- 146 comprehensive tests (all passing)
@github-actions
Copy link
Copy Markdown

PR Validation Complete

Check Status
Title Format ⚠️
Description ⚠️
Code Size
Lint
Type Check
Tests

View full details

Copy link
Copy Markdown

@beta-devin-ai-integration beta-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 7 additional findings in Devin Review.

Staging: Open in Devin

delete(key: string, node: string): void {
// LWW-Map deletion: set a tombstone register
const reg = this.entries.get(key);
const ts = (reg?.timestamp || 0) + 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 LWWMap.delete uses stale timestamp for tombstone, causing deletes to lose to older concurrent sets

LWWMap.delete computes the tombstone timestamp as (reg?.timestamp || 0) + 1 instead of using Date.now(). Since LWWMap.set (via LWWRegister.set at crdt-primitives.ts:178) uses Date.now() for its timestamp, the delete's tombstone gets a timestamp based on the original set time, not the deletion time. In a distributed scenario, this means a delete performed at time T=3000 on a key originally set at T=1000 gets tombstone timestamp 1001. A concurrent set from another replica at T=2000 would win during merge (2000 > 1001), even though the delete happened chronologically later. This violates LWW semantics and can cause deleted keys to silently reappear after merging.

Suggested change
const ts = (reg?.timestamp || 0) + 1;
const ts = Math.max(Date.now(), (reg?.timestamp || 0) + 1);
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

// ---- METRICS ----
if (method === 'POST' && path === '/metrics/increment') {
const body = JSON.parse(await parseBody(req)) as ReportMetricRequest;
this.store.metrics.increment(body.name, body.value);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Metrics API endpoint attributes all metrics to the store's replica instead of the requesting agent

The /metrics/increment handler at api.ts:248 calls this.store.metrics.increment(body.name, body.value), which internally uses this.replicaId (the store owner's agent ID) as the counter node. The body.agentId from the request is completely ignored. This means when agent-2 reports a metric to agent-1's API, the metric is attributed to agent-1, not agent-2. The MetricsAggregator already has a recordIncrement method (metrics.ts:64-71) that accepts an explicit agentId parameter for exactly this purpose.

Suggested change
this.store.metrics.increment(body.name, body.value);
this.store.metrics.recordIncrement(body.name, body.agentId, body.value);
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment on lines +75 to +83
getSnapshot(): FleetSummary {
const tasks = this.taskBoard.getAllTasks();
return this.fleetState.getSummary({
totalTasks: tasks.length,
pendingTasks: tasks.filter(t => t.status === 'pending' as any).length,
inProgressTasks: tasks.filter(t => t.status === 'in_progress' as any).length,
completedTasks: tasks.filter(t => t.status === 'completed' as any).length,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 FleetSummary always reports 0 for totalKnowledgeEntries and metricsCount

FleetCollabStore.getSnapshot() delegates to this.fleetState.getSummary(taskInfo) which hardcodes totalKnowledgeEntries: 0 and metricsCount: 0 (fleet-state.ts:163-164). The store has this.knowledgeBase.size() and this.metrics.getMetricNames().length readily available but never passes them. This also affects the keysAffected computation in merge() (crdt-store.ts:67), where the totalKnowledgeEntries delta is always 0, so merging knowledge entries is never reflected in keysAffected.

Prompt for agents
The FleetCollabStore.getSnapshot() method only passes task info to FleetState.getSummary(), but the FleetSummary interface also includes totalKnowledgeEntries and metricsCount which are always returned as 0.

The fix requires changes in two places:
1. In crdt-store.ts getSnapshot(), the knowledge and metrics counts need to be populated. The store has this.knowledgeBase.size() and this.metrics.getMetricNames().length available.
2. Either extend FleetState.getSummary() to accept knowledge/metrics info in its parameter, or have getSnapshot() merge the result from getSummary() with the additional fields directly.

A simple approach: after getting the summary from fleetState.getSummary(taskInfo), override totalKnowledgeEntries and metricsCount:
  const summary = this.fleetState.getSummary(taskInfo);
  summary.totalKnowledgeEntries = this.knowledgeBase.size();
  summary.metricsCount = this.metrics.getMetricNames().length;
  return summary;
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant