Skip to content

Add autonomous agent infrastructure (M1, M3, M5)#517

Draft
kovtcharov wants to merge 100 commits intomainfrom
kalin/autonomous-agent-infra
Draft

Add autonomous agent infrastructure (M1, M3, M5)#517
kovtcharov wants to merge 100 commits intomainfrom
kalin/autonomous-agent-infra

Conversation

@kovtcharov
Copy link
Collaborator

Summary

  • M1 Persistent Memory: SharedAgentState singleton with MemoryDB (session-scoped working memory + FTS5) and KnowledgeDB (cross-session insights, credentials, preferences). MemoryMixin provides 8 agent tools and automatic fact extraction after each query. FTS5 uses AND semantics with bm25 ranking and OR fallback. Insight deduplication at >80% word overlap. Confidence decay for stale insights.
  • M3 Service Integration & Computer Use: WebSearchMixin (Perplexity API + WebClient), ComputerUseMixin (PlaywrightBridge abstraction for learn/replay/list/test browser workflows with self-healing selectors and screenshot capture), ServiceIntegrationMixin (API discovery, encrypted credential management, preference learning via explicit correction and implicit confirmation, decision workflow executor).
  • M5 Scheduled Autonomy: Async timer-based Scheduler with SQLite persistence, natural language interval parsing ("every 6h", "daily", "30m"), full task lifecycle (create/pause/resume/cancel/delete), REST API at /api/schedules/*, and FastAPI server integration with startup/shutdown lifecycle.

Test plan

  • 274 unit tests passing across 9 test files
  • Integration tests for live Playwright browser workflows
  • Integration tests for live Perplexity API (requires API key)
  • Integration tests for cross-session memory persistence
  • M2 (Agent UI MCP Server) integration for scheduler MCP tools

kovtcharov and others added 30 commits March 5, 2026 15:34
New feature: lightweight desktop chat application with FastAPI backend,
SQLite persistence, SSE streaming, and RAG-powered document Q&A — all
running 100% locally on AMD Ryzen AI hardware.

**Backend (Python):**
- FastAPI server (`gaia.chat.ui.server`) on port 4200
- SQLite database with WAL mode (`gaia.chat.ui.database`)
- 14 Pydantic request/response models (`gaia.chat.ui.models`)
- REST API: sessions CRUD, streaming chat (SSE), document library
- Thread-based producer/consumer pattern for real-time streaming
- Document deduplication via SHA-256 hash
- Shared CLI/UI state via common SQLite database

**Frontend (TypeScript/React):**
- Vite + React + TypeScript web UI
- Sidebar with session management, document library modal
- Markdown rendering with syntax-highlighted code blocks
- Privacy indicators ("100% Local", "Your data never leaves this device")
- Responsive design with light/dark theme support
- Electron shell for desktop packaging

**CLI integration:**
- `gaia chat --ui` launches the Chat Web UI server
- `gaia chat --ui-port 8080` for custom port

**Documentation (3 new pages):**
- User guide: `docs/guides/chat-ui.mdx`
- SDK reference: `docs/sdk/sdks/chat-ui.mdx`
- Technical spec: `docs/spec/chat-ui-server.mdx`
- Updated quickstart with Desktop App section
- Updated index, chat guide, chat SDK, deployment, setup pages
- Added navigation entries in `docs/docs.json`

**Tests:**
- Unit tests for database, models, and server
- Electron app and installer tests

**Scripts & CI:**
- Build scripts for Windows/Linux installers
- NPM publish workflow for chat package
- Version bump and release scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nstaller improvements

- Add npm publishability: scoped @amd/gaia-chat package with bin/gaia-chat.mjs CLI entry
- Sync all versions from src/gaia/version.py (single source of truth)
- Inject version into UI at build time via Vite define (Sidebar + Settings)
- Create forge.config.cjs for SemVer conversion (4-part -> 3-part for Squirrel)
- Add self-contained main.js Electron entry (no external framework dependency)
- Lowercase installer filename: gaia-chat-setup.exe
- Update CI workflows to verify version.py and build for Windows + Linux
- Add npm publish workflow triggered by chat-v* tags
- Update release/bump scripts to read from version.py
- Update Jest tests to support external forge config and version.py checks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add restrictive default permissions (contents: read) to publish workflow
- Fix path traversal vulnerability in gaia-chat CLI static file server
  by resolving paths and validating they stay within dist directory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Resolve and validate file paths in document upload endpoint
- Prevent path traversal in SPA static file serving
- Remove internal error details from user-facing error messages
- Add exc_info logging for better server-side debugging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add _validate_file_path() with null byte, absolute path, and file extension checks
- Allowlist document/code extensions to prevent unsafe file type uploads
- Add TestValidateFilePath test class with 10 security validation tests
- Add integration test verifying upload endpoint rejects unsafe extensions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The release management section incorrectly stated the version was
managed via `version.txt`. The actual scripts read from
`src/gaia/version.py` (GAIA's single source of truth).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ations

- Add _sanitize_document_path() that returns a clean Path, isolating
  user input from all downstream filesystem calls
- Add _sanitize_static_path() with relative_to() containment check
  to prevent directory traversal in SPA file serving
- Reject ".." patterns and null bytes before path construction
- Add TestSanitizeDocumentPath and TestSanitizeStaticPath test classes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move path validation into safeLookup() function that rejects
  traversal patterns, null bytes, and non-alphanumeric characters
  before constructing any filesystem paths
- Return safe fallback (index.html) for all invalid paths
- Ensures readFile() only receives paths validated by safeLookup()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix technology description: Vanilla JS -> React + TypeScript (deployment/ui)
- Fix supported file formats to match _ALLOWED_EXTENSIONS allowlist (guides/chat-ui)
- Fix npm package names: @amd/gaia-chat -> @amd-gaia/chat, @gaia/electron -> @amd-gaia/electron
- Add --ui and --ui-port flags to CLI reference with examples (reference/cli)
- Add security helpers to spec function table and update security section
  with accurate host binding info and path validation details (spec/chat-ui-server)
- Enhance quickstart: rename to "Chat UI (Fastest)", add npm one-liner tab,
  update description to emphasize chat-first experience (quickstart)
- Update anchor links in index.mdx and setup.mdx to match new heading
- Add Chat UI to CLAUDE.md project structure, architecture, CLI commands,
  and documentation index

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ements

- Default to dark mode for new users
- Replace generic Bot icon with GAIA robot head mascot (circle)
- Add stop streaming button (replaces send button during generation)
- Add copy-to-clipboard button on all messages (hover to reveal)
- Add scroll-to-bottom FAB when scrolled up in long conversations
- Improve dark mode contrast (sharper bg separation, brighter text)
- Better message visual distinction (user vs assistant backgrounds)
- Fix trash icon overlapping timestamps in sidebar session list
- Rename npm package @amd/gaia-chat to @amd-gaia/chat across all files
- Rename Electron main.js to main.cjs for CommonJS compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add GAIA robot head favicon for browser tab
- Redesign input box: rounded pill shape, elevated background, softer focus
- Add lock icon to privacy footer
- Feature cards now have background fill for dark mode visibility
- Increase dark mode border contrast for sidebar/content divider
- Fix mobile sidebar defaulting to open on small screens

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ve integration tests

- Add AgentActivity component for real-time tool/shell execution display
- Add structured frontend logging utility (logger.ts)
- Add SSE handler module for server-side streaming with agent events
- Enhance ChatView, Sidebar, and settings with agent activity support
- Expand chatStore with agent activity state management
- Add API service logging, timing, and stream event types
- Enhance ChatAgent with shell tool support and streaming capabilities
- Add 66 integration tests covering full API lifecycle, document workflows,
  SSE streaming, security, pagination, error paths, unicode, and CORS
- Fix Electron test assertions for refactored store and API patterns
- Add chat-ui-agent-capabilities-plan spec document

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused imports: datetime, Any (models.py), UploadFile, File
  (server.py), json, AsyncMock (test_server.py), pytest (test_models.py),
  argparse, server_main (test_chat_ui_integration.py)
- Fix f-string without interpolation in cli.py
- Add pylint disable comments for false-positive no-member (RAGSDK) and
  interface-required unused-argument (sse_handler)
- Apply black/isort formatting to all chat UI source and test files
- Fix Electron test: delete confirmation text changed to "Delete?"
- All 216 Python tests and 252 Jest tests pass, lint clean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend (Python):
- Fix fragile conversation history reconstruction: replace stride-2
  iteration with sequential role pairing via _build_history_pairs()
  so unpaired messages (e.g. after streaming errors) don't misalign
  all subsequent context
- Switch database threading.Lock to RLock to prevent potential
  deadlocks if lock-holding methods are refactored to nest
- Fix touch_session to use _transaction() for consistent rollback-
  on-error behavior matching all other write operations
- Add thread safety (self._lock) to all database read operations
- Wrap _index_document and _get_chat_response in run_in_executor
  to avoid blocking the async event loop
- Add SSE keepalive comments every ~5s to prevent proxy/browser
  connection timeouts
- Restrict CORS from allow_origins=["*"] to specific localhost ports
  plus ngrok regex for tunnel support
- Add input validation on limit/offset query params
- Add max_length=100_000 to ChatRequest.message field
- Fix export endpoint shadowing Python builtin 'format'
- Append error notice to partial streaming responses instead of
  silently swallowing errors

Frontend (TypeScript/React):
- Fix double onDone call in SSE stream by tracking doneReceived flag
- Replace module-level stepIdCounter with useRef to prevent shared
  mutable state across component instances
- Fix rapid session switching race: guard stale message loads using
  useChatStore.getState() check before applying results
- Fix timer leaks in MessageBubble/CodeBlock copy handlers: track
  setTimeout via useRef and clean up on unmount
- Add AbortController cleanup on ChatView unmount/session change
- Fix MobileAccessModal to use centralized api client instead of
  raw fetch() for consistent error handling and logging
- Wrap all localStorage access in try/catch for browser compat
- Add .catch() to clipboard writeText calls
- Sanitize export filename to strip unsafe path characters

Also includes: mobile access tunnel (ngrok), connection status
banner, QR code modal, and UI theme refinements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix claude.yml release notes job: use direct_prompt instead of
  prompt for workflow_run event compatibility
- Simplify settings.local.json: replace verbose permission allowlist
  with wildcard mcp__* and switch MCP server to playwright

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s/webui

Move backend and frontend out of the chat-specific namespace to
reflect that the UI serves as a general-purpose agent interface,
not just chat.

Backend:
- src/gaia/chat/ui/ → src/gaia/ui/ (server, database, models,
  sse_handler, tunnel)
- Update setup.py package declaration
- Update cli.py import and log messages ("Chat UI" → "Agent UI")

Frontend:
- src/gaia/apps/chat/webui/ → src/gaia/apps/webui/
- Rename bin/gaia-chat.mjs → bin/gaia-ui.mjs

Update all references across:
- Unit tests, integration tests, Electron tests
- Documentation (guides, SDK docs, specs, deployment)
- Build scripts (build-chat-installer, release-chat, bump-version)
- CI workflows

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update all user-facing strings, install scripts, test assertions,
and frontend components to reflect the GAIA Agent UI identity.
Fixes SettingsModal CLI command (gaia chat ui → gaia chat --ui),
updates npm package references to @amd-gaia/agent-ui, CLI binary
to gaia-ui, and fixes Electron test path for version.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename scripts: build-chat-installer → build-ui-installer,
  install-chat → install-ui, bump-chat-version → bump-ui-version,
  release-chat → release-ui, publish-npm-chat → publish-npm-ui
- Update all self-references and docs to use new script names
- Fix critical bug: rag.index_file() → rag.index_document()
- Fix document IDs not passed to ChatAgent (rag_documents config)
- Regenerate root package-lock.json (remove stale @amd-gaia/chat)
- Update remaining @amd-gaia/chat → @amd-gaia/agent-ui in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical fixes:
- build-ui-installer.ps1: Fix path apps/chat/webui → apps/webui
- tunnel.py: Make httpx import lazy to prevent ImportError
- release-ui.mjs: Fix tag format chat-v* → v* to match CI trigger
- publish-npm-ui.yml: Fix concurrency group npm-publish-chat → ui

Documentation fixes:
- Update all gaia-chat CLI refs → gaia-ui in docs
- Update src/gaia/chat/ui/ paths → src/gaia/ui/ in docs + CLAUDE.md
- Fix service name gaia-chat-ui → gaia-agent-ui in SDK docs
- Fix CORS description to match actual allowlist implementation
- Add sse_handler.py and tunnel.py to module structure listing

Code improvements:
- models.py: Use gaia.version for SystemStatus.version instead of
  hardcoded 0.1.0
- MessageBubble.tsx: Fix error detection string to match actual
  backend error messages
- Update tests to use dynamic version from gaia.version

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds fastapi, uvicorn, httpx, and psutil as installable via
pip install -e ".[ui]" so the server can run without manually
installing individual packages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Security:
- Fix path prefix attack in PathValidator (project-secrets matched project)

API/Server:
- Fix SSEOutputHandler missing model_id and streaming parameters (TypeError crash)
- Fix missing f-string prefix in app.py stop_server message
- Fix None content crash in openai_server.py message preview
- Fix Claude provider missing max_tokens default (API requirement)

Agents:
- Fix rebuild_system_prompt() dropping mixin prompts (use _compose_system_prompt)
- Fix progress spinner not stopped on LLM errors (ConnectionError + Exception)
- Fix os.makedirs("") crash on bare filenames in file_io.py
- Fix update_gaia_md always reporting "updated" (check existence before write)

Infrastructure:
- Fix truncated Blender MCP responses (single recv → loop)
- Fix np.concatenate crash on empty TTS input
- Fix double None sentinel in TTS streaming error path
- Fix division by zero in eval generators on empty results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The streaming test was failing because max_tokens=20 is insufficient when
Qwen3 models use thinking/reasoning tokens before producing visible content.
Increased token budget to 200, relaxed time assertions for slow CI runners,
and made the chunk count assertion handle the case where all tokens are
consumed by reasoning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused imports (json in tunnel.py, asyncio in test_tunnel.py)
- Add check=False to subprocess.run calls in tunnel.py (pylint W1510)
- Apply black formatting to server.py and tunnel.py
- Add asyncio_mode="auto" to pyproject.toml for pytest-asyncio support
- Update Electron tests to expect 'agent-ui' instead of 'chat' (renamed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arkers

The tunnel unit tests used @pytest.mark.asyncio decorators which required
pytest-asyncio to be properly configured. CI environments may not have the
right asyncio mode configuration. Using asyncio.run() directly makes the
tests work without any pytest plugin dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RAG:
- Fix _llm_based_chunking missing f-string prefix (literal {chunk_size} sent to LLM)
- Fix nonexistent llm_client.completions() call (should be generate())

Eval:
- Fix np.min/np.max crash on empty similarity arrays in eval.py (2 locations)

CLI:
- Fix 3 division-by-zero bugs in transcript/email/document cost averages
- Fix unclosed log_handle in MCP bridge background mode

Agents:
- Replace deprecated datetime.utcnow() with time.monotonic() in shell_tools.py
- Replace deprecated datetime.utcnow() with time.monotonic() in testing.py
- Guard rich import in SilentConsole.display_stats() with RICH_AVAILABLE check

Infrastructure:
- Fix resource leak: store and close log_file in lemonade_client.py terminate_server
- Fix SQL injection in database/agent.py db_schema tool (validate table name)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Database:
- Fix 4 cursor access outside context manager scope in ui/database.py
  (delete_session, add_message, delete_document, detach_document)
  Store rowcount/lastrowid inside the with block before exiting

Routing:
- Fix unsafe [1] index on ROUTING_ANALYSIS_PROMPT.split() — use [-1]
  to prevent IndexError if prompt template is modified

MCP Bridge:
- Fix boundary type confusion: was decoded to str then checked as bytes
  Simplified to single decode→encode chain

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Apply Black formatting to test_chat_sdk.py and security.py
- Remove unused pytest import from test_tunnel.py
- Suppress pylint unused-argument warning on SSE handler's
  print_final_answer (parameter is part of interface contract)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kovtcharov and others added 3 commits March 13, 2026 01:21
Resolve settings.local.json conflict: keep both branch permissions (mcp__* wildcard + individual entries).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement three milestones of generic autonomous agent infrastructure:

M1 - Persistent Memory: MemoryDB + KnowledgeDB with FTS5 (AND default,
bm25 ranking, OR fallback), insight deduplication, confidence decay,
credential storage, and MemoryMixin with 8 tools and auto-extraction.

M3 - Service Integration & Computer Use: WebSearchMixin (Perplexity API +
WebClient), ComputerUseMixin (PlaywrightBridge, learn/replay/list/test
workflows with self-healing selectors), ServiceIntegrationMixin (API
discovery, credential encryption, preference learning with explicit
correction and implicit confirmation, decision workflow executor).

M5 - Scheduled Autonomy: Async timer-based Scheduler with DB persistence,
interval string parsing, task lifecycle (create/pause/resume/cancel/delete),
REST API endpoints, and FastAPI server integration.

274 unit tests covering all components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added documentation Documentation changes dependencies Dependency updates devops DevOps/infrastructure changes agents Agent system changes jira Jira agent changes talk Talk agent changes chat Chat SDK changes mcp MCP integration changes rag RAG system changes llm LLM backend changes audio Audio (ASR/TTS) changes cli CLI changes eval Evaluation framework changes tests Test changes electron Electron app changes security Security-sensitive changes performance Performance-critical changes labels Mar 13, 2026

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

…t-infra

# Conflicts:
#	.claude/settings.local.json
#	.github/workflows/claude.yml
#	src/gaia/logger.py
- Change default LLM from Qwen3-Coder-30B-A3B to Qwen3.5-35B-A3B across
  all agents, configs, tests, and documentation
- Fix scheduler Create button requiring parse API success to enable
- Add "every minute/second/week" to NL schedule parser
- Fix CodeQL path traversal in files.py, XSS in chat-ui.js,
  regex backtracking in MessageBubble.tsx and EMR server,
  path traversal in eval server.js
…uler UI flash

- Correct model ID from Qwen3.5-35B-A3B-Instruct-GGUF to Qwen3.5-35B-A3B-GGUF
  (matching actual Lemonade model registry)
- Fix broken internal doc link in tray-app-integration.md
- Fix broken external URLs in deployment/ui.mdx, plans/agent-ui.mdx,
  sdk/sdks/agent-ui.mdx, spec/agent-ui-server.mdx, plans/agent-hub.mdx
- Fix scheduler UI flash caused by loading state resetting on every poll
- eval/webapp/server.js: Replace inline path validation with safePath()
  helper for consistent path traversal prevention across all endpoints
- chat-ui.js: Remove unnecessary escapeHTML/unescapeHTML round-trip;
  textContent auto-escapes, eliminating XSS and double-unescape alerts
- dev-server.js: Add path.resolve containment check for static file serving
- docs/server.js: Strengthen redirect sanitizer with URL parsing and
  backslash stripping to prevent open redirect
container.appendChild(el);
} else if (token.type === 'link') {
const el = document.createElement('a');
el.href = token.url;

Check failure

Code scanning / CodeQL

DOM text reinterpreted as HTML High

DOM text
is reinterpreted as HTML without escaping meta-characters.
}
const filePath = safePath(EVALUATIONS_PATH, req.params[0]);
if (!filePath) return res.status(400).json({ error: 'Invalid file path' });
if (!fs.existsSync(filePath)) return res.status(404).json({ error: 'File not found' });

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
filePath = rootPath;
} else {
filePath = safePath(TEST_DATA_PATH, filename);
if (!filePath || !fs.existsSync(filePath)) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
break;
}
const p = safePath(TEST_DATA_PATH, path.join(type, filename));
if (p && fs.existsSync(p)) { metadataPath = p; break; }

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
@kovtcharov kovtcharov added this to the v0.18.0 milestone Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent system changes audio Audio (ASR/TTS) changes chat Chat SDK changes cli CLI changes dependencies Dependency updates devops DevOps/infrastructure changes documentation Documentation changes electron Electron app changes eval Evaluation framework changes jira Jira agent changes llm LLM backend changes mcp MCP integration changes performance Performance-critical changes rag RAG system changes security Security-sensitive changes talk Talk agent changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant