A powerful, modern UI that integrates local and hosted LLMs with intelligent web search and content extraction, SQL, YouTube transcript analysis, HubSpot actions, Python data analysis — and a Codex MCP server for safe code scaffolding — all via the Model Context Protocol (MCP). Features include personalized AI assistance through user profiles and conversation context, provider settings, multi‑provider model picking, streaming chat, persistent conversations, a robust Tasks system, and Scheduled tasks with timezone‑aware timing.
Osoba showcases how to extend both local and hosted models through MCP tool use. It combines locally running LLMs via Ollama with intelligent web search and content extraction, SQL querying, YouTube transcript ingestion, HubSpot business actions, Python-based CSV analysis/visualization — and a Codex Workspace server for code generation inside an isolated workspace. A multi‑provider layer adds OpenAI, Anthropic, Google, OpenRouter, Groq, and SambaNova. Conversations and tasks persist in MongoDB.
The project consists of several key components:
-
Backend (FastAPI): Manages chat logic, model provider routing, MCP service communication (including starting and managing MCP services like web search, SQL, YouTube, HubSpot, Python, Codex), tasks/scheduler, and persistence.
-
Frontend (React): A modern, responsive web interface for users to interact with the chat application.
-
MCP Smart Web Search Server Module: Provides intelligent web search via the Serper.dev API with automatic content extraction from top results. Uses multi-method content extraction (trafilatura, BeautifulSoup) with URL prioritization, robots.txt compliance, and polite crawling practices.
-
MCP SQL Server Module: Read-only querying against a MySQL database (with schema resources and query safety).
-
MCP YouTube Transcript Server: Robustly fetches transcripts using multiple strategies (youtube-transcript-api, Pytube, yt-dlp) with optional proxy support, then persists transcript context to the conversation for follow-ups.
-
MCP HubSpot Business Tools: Create/update marketing emails via HubSpot APIs using OAuth, with a JSON-first prompting flow.
-
MCP Python Data Analysis Server: Comprehensive data analysis toolkit including CSV loading, data profiling, filtering, grouping/aggregation, outlier detection, data type conversion, statistical hypothesis testing, and visualization (base64 images) for complete analytical workflows.
-
MCP Codex Workspace Server: Creates per‑run workspaces and launches the Codex CLI within an isolated directory, persists artifacts (JSONL events, manifest), enforces a configurable output policy, and exposes async run APIs for a streaming‑friendly UX. Gated on a valid OpenAI API key.
-
MongoDB: Stores conversation history and user data.
This architecture demonstrates how MCP enables local models to access external tools and data sources, significantly enhancing their capabilities. The backend starts and supervises all MCP services for a scalable, feature-rich setup.
- Watch the demo on X: https://x.com/redbuilding/status/2010124029936427450
- 🧠 Smart Web Search: Intelligent web search with automatic content extraction from top results. Goes beyond search snippets to fetch and analyze full webpage content using advanced extraction techniques.
- 🔎 URL Prioritization: Smart ranking of search results based on relevance scoring, including title/snippet matching, domain authority, and search position weighting.
- 🤖 Polite Web Crawling: Respects robots.txt, implements rate limiting, and uses proper User-Agent identification for ethical content extraction.
- 👤 User Profile & Context: Configure personal information (role, expertise, projects) and pin conversations for contextual AI assistance. The AI understands your background and can reference previous work for personalized responses.
- 🎯 Goals & Priorities: Define your short-term, medium-term, and long-term goals in a structured document (up to 2000 characters) to help the AI understand your objectives and provide relevant assistance.
- 🔔 Proactive Agent Heartbeat: Enhanced background service with context gathering (semantic memory, git, project files, system health), automated task creation, and file-based configuration (HEARTBEAT.md). Configurable intervals, context sources, and two-way sync between file and UI.
- 📌 Conversation Pinning: Select specific conversations to include as context for future chats, enabling the AI to build upon previous discussions and maintain continuity across sessions.
- 🧠 Semantic Memory: Unlimited conversation storage with intelligent semantic search powered by ChromaDB and nomic-embed-text embeddings. Automatically indexes conversations with 5+ messages, searches by meaning (not keywords), and injects relevant past conversations into new chats. Includes Memory Browser (Ctrl+Shift+M) for searching and managing your conversation history.
- 🧾 AI Chat Summaries (On‑Demand): Generate concise, LLM‑authored summaries for conversations and use them as the context payload for pinned chats (no heuristics). Summaries are user‑triggered, model‑selectable in Settings, and pinning enforces a cap of 5 chats.
- 🎯 Personalized AI Responses: AI adapts its communication style and suggestions based on your profile information and pinned conversation history for more relevant assistance.
- 💾 Persistent Conversations: Chat history is saved in MongoDB, allowing users to resume conversations.
- 🧠 Multi‑Provider LLMs: Ollama (local), plus OpenAI, Anthropic, Google, OpenRouter, Groq, SambaNova — with a Settings screen for API keys and a unified model picker.
- 🔌 MCP integration: Backend manages multiple MCP tools (web, SQL, YouTube, HubSpot, Python) as background services.
- 💻 Modern Web Interface: Built with React for a responsive and interactive user experience.
- 📊 Structured search results: Clean formatting of web search data for optimal context.
- 🌐 Enhanced Content Extraction: Multi-method content extraction using trafilatura and BeautifulSoup with graceful fallbacks for maximum reliability.
- ⚙️ Backend API: FastAPI backend providing robust API endpoints for chat and conversation management.
- 🗃️ SQL Querying Tool: Read-only MySQL querying with schema introspection and retry logic.
- 🔄 Conversation Management: List, rename, and delete conversations.
- 📺 YouTube Transcript Tool: Paste a YouTube URL; transcripts are fetched and stored for multi-turn follow-ups.
- 🧰 HubSpot Tools: OAuth-connect, then create/update marketing emails via guided JSON prompts.
- 🐍 Python Analysis Tool: Upload a CSV and run comprehensive analysis including data profiling, filtering, grouping, outlier detection, statistical testing, type conversion, and visualization — results stream back with images and detailed insights.
- ⚡ Streaming Responses: Frontend renders model output token-by-token and indicators for tool usage.
- 🗓️ Long‑Running Tasks (Plan & Execute): Create autonomous tasks that plan and execute multi‑step workflows, with budgets, retries, and verification.
- 📈 Live Task Progress: Tasks stream progress via SSE; per‑step outputs (tables/images/text) render in the UI.
- 🧩 LLM‑only Steps (No MCP): Tasks can include steps that run directly on Ollama (e.g., summaries/reasoning) without using any MCP tool.
- 🚦 Priority Task Queue: Memory-safe task execution with priority scheduling - scheduled tasks run first, user tasks queue behind them, only one task executes at a time to prevent system overload, especially important for local, memory-constrained systems.
- ✨ Codex Workspace (MCP): Launch Codex to generate/edit files in an isolated workspace; inline run status in chat; artifacts persisted for review; gated on OpenAI key.
- 🗓️ Scheduled Tasks (Timezone‑Aware): Recurring cron or one‑time schedules computed in local timezone with DST safety; auto‑disable after first run for one‑time schedules; “Run now” with model override.
The task system has full access to all MCP tools available in the chat interface, enabling autonomous execution of complex workflows:
Available Tools (33 total):
- Web Search (4 tools): Basic search, smart content extraction, image search, news search
- Python Data Analysis (17 tools): Data loading, inspection, cleaning, transformation, statistical analysis, visualization
- HubSpot Business (2 tools): Create/update marketing emails (requires OAuth)
- Codex Workspace (7 tools): Code generation and workspace management (requires OpenAI API key)
- Database (1 tool): Read-only SQL queries
- YouTube (1 tool): Transcript extraction
- LLM-only (1 tool): Direct LLM generation for reasoning steps
Advanced Task Capabilities:
- Research with smart content extraction (full webpage content, not just snippets)
- Advanced data analysis with outlier detection and statistical hypothesis testing
- Marketing automation with HubSpot integration
- Code generation with fine-grained Codex workspace management
- Multi-step workflows combining search, analysis, and generation
The application uses a priority-based task queue to ensure system stability and prevent memory overload from multiple LLM instances:
- Priority 1 (Highest): Scheduled tasks always run first
- Priority 2 (Standard): User-created tasks queue behind scheduled tasks
- One Task at a Time: Only one task executes simultaneously to prevent memory crashes
- Queue Position: Users receive feedback about their position in the queue
- Prevents Overload: Multiple concurrent LLM instances (e.g., 3x Llama3.1 8B = 24GB) could crash systems with limited RAM
- Safe Execution: Single task execution ensures memory usage stays within system limits
- Automatic Queuing: Tasks automatically queue when another task is running
- Cron-based Scheduling: Uses standard cron expressions for flexible scheduling
- System Requirements: Scheduled tasks only run when the system is awake and the application is running
- Catch-up Execution: Overdue tasks execute immediately when the system resumes
Catch-up Logic (Default):
When the backend starts after being stopped or the computer wakes from sleep, overdue scheduled tasks execute immediately. Tasks delayed by more than 5 minutes are marked with a warning indicator in the UI showing how late they ran (e.g., "
Delay Breakdown: The system distinguishes between two types of delays:
- System Delay (sleep): Time the computer was asleep or backend was stopped
- Queue Delay (queued): Time spent waiting for other tasks to complete
The UI shows the breakdown: "
Running as a Service (Optional):
For guaranteed execution and automatic startup, you can run Osoba backend as a system service. This ensures the backend is always running and ready to execute scheduled tasks on time. Setup scripts are provided for macOS (Launch Agent), Linux (systemd), and Windows (NSSM). See scripts/README.md for detailed setup instructions.
Wake Scheduling (Advanced): On supported systems, you can configure the computer to wake from sleep specifically to run scheduled tasks. This requires administrator/root access and is recommended primarily for desktop machines or when plugged in due to battery impact. Platform-specific instructions:
- macOS: Uses
pmsetto schedule wake events - Linux: Uses RTC wake (hardware support required)
- Windows: Uses Task Scheduler with wake timers
See scripts/README.md for complete wake scheduling setup and troubleshooting.
- Python 3.11+
- Node.js (v18+) and npm/yarn for the frontend
- Ollama installed and running locally
- A Serper.dev API key (free tier available)
- MongoDB instance (local or cloud)
- MySQL server (optional, for the SQL querying tool)
- Internet connection for web searches and package downloads
For Semantic Memory:
- Ollama with
nomic-embed-textmodel:ollama pull nomic-embed-text - ChromaDB and tiktoken (automatically installed via requirements.txt)
Optional (enable additional tools):
- Smart web search:
trafilatura,beautifulsoup4,lxml(automatically installed) - Python data analysis:
pandas,numpy,matplotlib,seaborn - YouTube transcripts:
youtube-transcript-api,pytube,yt-dlp,requests - HubSpot OAuth: valid OAuth app (client ID/secret) and redirect URL
- Codex Workspace: Codex CLI available on PATH (or set
CODEX_BIN), OpenAI API key configured
-
Clone the repository:
git clone https://github.com/redbuilding/osoba.git cd osoba -
Set up Backend:
-
Navigate to the backend directory:
cd backend -
Install Python dependencies:
pip install -r requirements.txt
-
Create a
.envfile in thebackenddirectory. This is where the backend and its managed MCP services will look for environment variables. To securely store provider API keys, you must generate a stable encryption key (Fernet) and setSETTINGS_ENCRYPTION_KEY:- Install:
pip install cryptography - Generate (pick one):
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"- In Python REPL:
from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())
- Paste the key into
.envasSETTINGS_ENCRYPTION_KEY=...
# For encrypting API keys entered into model provider Settings modal SETTINGS_ENCRYPTION_KEY=<paste_generated_fernet_key_here> # For Smart Web Search (server_search.py) SERPER_API_KEY=your_serper_api_key_here # Smart extraction configuration (optional) SMART_EXTRACT_MAX_URLS=3 SMART_EXTRACT_MAX_CHARS_PER_URL=2000 SMART_EXTRACT_MAX_TOTAL_CHARS=5000 SMART_EXTRACT_REQUEST_DELAY=1.0 # For MongoDB (main.py) MONGODB_URI=mongodb://localhost:27017/ MONGODB_DATABASE_NAME=mcp_chat_db # For MySQL Database Querying (server_mysql.py) DB_HOST=localhost DB_USER=your_db_user DB_PASSWORD=your_db_password DB_NAME=your_db_name # Optional: HubSpot OAuth (backend/auth_hubspot.py) HUBSPOT_CLIENT_ID=your_hubspot_client_id HUBSPOT_CLIENT_SECRET=your_hubspot_client_secret HUBSPOT_REDIRECT_URI=http://localhost:8000/auth/hubspot/oauth-callback FRONTEND_URL=http://localhost:5173 # Optional: YouTube transcript server (backend/server_youtube.py) # YTA_PROXY=https://user:pass@host:port # YTA_LOG_LEVEL=INFO # Optional: Backend defaults DEFAULT_OLLAMA_MODEL=llama3.1 OLLAMA_REPEAT_PENALTY=1.15 # Codex MCP debugging state CODEX_DEBUG=false
- Install:
-
Install optional dependencies for additional tools (if you plan to use them):
# Smart web search content extraction (recommended) pip install trafilatura beautifulsoup4 lxml # YouTube transcript tool pip install youtube-transcript-api pytube yt-dlp requests # Python analysis tool pip install pandas numpy matplotlib seaborn scipy
-
-
Set up Frontend:
- Navigate to the frontend directory:
(If you were in
cd ../frontendbackend/, otherwise navigate from project root:cd frontend) - Install Node.js dependencies:
npm install # or # yarn install
- Optional: create
frontend/.envwith a custom API URL (default ishttp://localhost:8000/api):echo 'VITE_API_URL=http://localhost:8000/api' > .env
- Navigate to the frontend directory:
-
Ensure Ollama is installed and a model is available: The application might default to a specific model (e.g.,
llama3.1). Pull your desired model:ollama pull llama3.1 # or your preferred model like llama3, mistral, etc.You can select the model in the UI, or configure a default via
DEFAULT_OLLAMA_MODELin the backend.envfile.
⚠️ Osoba is designed for local, single-user use. The API has no authentication layer. Never expose port 8000 to untrusted networks (e.g., via ngrok, port forwarding, or binding to0.0.0.0).
- Network Exposure: The backend binds to
127.0.0.1by default. Do not change this to0.0.0.0unless you understand the risks. - Encryption Key: Set
SETTINGS_ENCRYPTION_KEYinbackend/.envto a stable Fernet key. Without it, provider API keys are lost on every restart. - MongoDB: The default connection uses no authentication. For sensitive data, enable MongoDB authentication and use a connection string with credentials.
- Provider API Keys: Keys are encrypted at rest with Fernet and never returned by the API. Keep your
.envfile secure and out of version control. - MySQL Queries: Queries are validated as read-only SELECT statements and automatically limited to 1000 rows. Connection and read timeouts prevent runaway queries.
- CSV Uploads: Uploads are capped at 50 MB decoded size, and the in-memory DataFrame store is limited to 10 datasets (oldest evicted first).
-
Ensure Prerequisites are Running:
- Ollama: Must be running.
- MongoDB: Your MongoDB instance must be accessible.
- MySQL Server (if using the SQL tool): Your MySQL server must be running and accessible with the credentials provided in
.env.
-
Start the Backend Server: Navigate to the
backenddirectory and run the FastAPI application:# From the backend directory uvicorn main:app --reload --port 8000The backend API will typically be available at
http://localhost:8000. The FastAPI application automatically starts and manages all MCP services (Web, SQL, YouTube, HubSpot, Python, Codex) as background processes using its lifespan manager. You do not need to run the MCP servers separately. -
Start the Frontend Development Server: Navigate to the
frontenddirectory and run:npm run dev # or # yarn dev
The web interface will typically be accessible at
http://localhost:5173(or another port specified by Vite).
- Open your browser to the frontend URL (e.g.,
http://localhost:5173). - Use the chat interface to send messages; responses stream live with indicators when tools run.
- Click the ✨ Tool Selector to enable one of: Smart Web Search, Database, YouTube, HubSpot, Python, Codex (requires OpenAI configured).
- Use Settings (header) to configure provider API keys and unlock non‑Ollama models and Codex.
- Configure User Profile: In Settings → User Profile, add your role, expertise areas, current projects, and communication preferences for personalized AI assistance.
- Set Your Goals: In Settings → Goals & Priorities, define your short-term, medium-term, and long-term goals (up to 2000 characters). The AI will use these goals to provide contextual assistance and proactive insights.
- Enable Proactive Insights: Configure the heartbeat service in Settings → Goals & Priorities → Heartbeat Settings to receive periodic AI-generated insights about your progress, blockers, and suggestions. Click the bell icon (🔔) in the header to view insights.
- Pin Conversations: Hover over conversations in the sidebar and click the pin button to include them as context for future chats.
- If a conversation lacks a summary, you’ll be prompted to generate one before pinning. The UI blocks while the summary is created, then completes the pin.
- Up to 5 conversations can be pinned. Attempts beyond the cap are blocked and the sidebar shows “X/5 pinned”.
- Summaries are generated on‑demand by your chosen model (see Settings → Summaries) and are stored for reuse; only stored summaries are used for context.
- For YouTube: paste a video URL. The transcript is fetched and saved to the conversation for follow-ups.
- For HubSpot: click “Connect HubSpot” to complete OAuth, then describe the email to create/update.
- For Python: upload a CSV file when prompted; follow-up questions reuse the loaded DataFrame for advanced analysis including filtering, grouping, outlier detection, statistical testing, and visualization.
- Manage conversations using the sidebar (create new, select, rename, delete, pin for context).
- Open the Tasks panel (Tasks button in the header) to:
- Create a new task by entering a high‑level goal.
- Monitor progress (live SSE stream), view step outputs (tables, images, text), and Pause/Resume/Cancel.
- “Promote to Task” from any user chat message to pre‑fill the goal and link the task to the conversation.
- Copy entire task results or individual step outputs using dedicated copy buttons.
- Delete completed, failed, or canceled tasks to clean up the task list.
- The backend plans each task as structured JSON (steps, tool, parameters, success criteria), then executes steps sequentially with:
- Budgets: max wall‑time and max tool calls.
- Per‑step timeouts and capped retries with backoff.
- Output verification against success criteria.
- Completion: When the final step finishes, the task status becomes COMPLETED. On failure/timeouts/budgets exceeded, status is FAILED. A concise summary is posted back into the linked conversation.
- When an OpenAI key is configured, the planner may propose a
codex.runstep for scaffolding/creation goals; otherwise such steps are automatically gated off.
- When an OpenAI key is configured, the planner may propose a
APIs:
- Create task:
POST /api/tasks { goal, conversation_id?, dry_run? } - List tasks:
GET /api/tasks - Task detail:
GET /api/tasks/{id} - Task stream (SSE):
GET /api/tasks/{id}/stream - Pause/Resume/Cancel:
POST /api/tasks/{id}/pause|resume|cancel - Delete task:
DELETE /api/tasks/{id} - Status:
GET /api/statusincludestasks.activecount
LLM‑only steps (no MCP):
- The planner supports
llm.generatesteps that run directly via Ollama, without any MCP server. If apromptis omitted, the step’sinstructionis used. These steps still respect budgets, timeouts, and verification.
The original chat_client.py (terminal) and chat_frontend.py (Gradio) have been removed from the repository as they are no longer compatible with the current FastAPI backend architecture. All functionality is now provided through the modern React frontend.
The Python MCP server provides comprehensive data analysis capabilities through the following tools:
load_csv- Load CSV data from base64 encoded stringsget_head- Display first N rows of DataFrameget_data_info- Comprehensive DataFrame information (dtypes, memory usage, non-null counts)get_descriptive_statistics- Statistical summary for numerical columns
check_missing_values- Identify missing values across columnshandle_missing_values- Handle missing data (drop, fill, interpolate)convert_data_types- Safe data type conversion (datetime, category, numeric)detect_outliers- Outlier detection using IQR or Z-score methods
filter_dataframe- Filter data using pandas query syntax with security validationgroup_and_aggregate- Group by columns and apply aggregation functionsquery_dataframe- Advanced DataFrame querying with new DataFrame creationrename_columns- Rename DataFrame columnsdrop_columns- Remove specified columns
get_correlation_matrix- Compute correlation matrix for numerical columnsget_value_counts- Frequency analysis for categorical columnsperform_hypothesis_test- Statistical hypothesis testing:- Two-sample t-tests
- Pearson correlation tests
- Chi-square tests of independence
create_plot- Generate various plot types:- Scatter plots
- Histograms
- Bar charts
- Box plots
- Returns base64-encoded images for web display
All tools maintain session state through an in-memory DataFrame store, enabling complex multi-step analytical workflows within a single conversation.
The application provides personalized AI assistance through user profiles and conversation context management:
- Personal Information: Configure your role, expertise areas, current projects, and communication preferences
- Contextual Understanding: The AI understands your background and adapts responses accordingly
- Settings Integration: Access via Settings → User Profile for easy configuration
- Selective Pinning: Choose specific conversations to include as context for future chats
- Visual Controls: Pin/unpin conversations directly from the sidebar with intuitive controls
- Automatic Summaries: System automatically generates summaries for pinned conversations when needed
- Hybrid Summary Approach: Captures both user questions and assistant solutions for complete context
- Context Continuity: AI can reference previous problems AND solutions for coherent assistance
- Smart Limits: Context is budgeted to ~2700 characters total (200 profile + up to 2500 across pinned summaries)
- Persistent Storage: Generated summaries are stored and reused to avoid regeneration
The AI combines your profile information with pinned conversation context to provide:
- Relevant Suggestions: Recommendations based on your role and current projects
- Appropriate Complexity: Technical depth matching your expertise level
- Consistent Context: Ability to reference previous work and maintain conversation continuity
- Solution Awareness: Knows what approaches were already tried and can build upon them
- Adaptive Communication: Response style tailored to your preferences
- User Profile: Information about YOU (role, projects, preferences) - helps AI understand who it's talking to
- AI Profile: Configuration of the AI's behavior and personality - defines how the AI should act
- Combined Effect: Creates personalized interactions where the AI knows both who you are and how it should respond
- Automatic Summary Generation: When conversations are pinned, summaries are generated on-demand from the last 5 messages
- Hybrid Summary Format: Includes both user questions and key assistant solutions (e.g., "User question | Solved: pandas, matplotlib")
- Smart Solution Extraction: Identifies code examples, tool recommendations, and solution patterns from assistant responses
- User Isolation: Conversations are filtered by user to prevent cross-user context leakage
- Performance Optimization: Summaries are generated once and stored for reuse across multiple chats
Osoba features an advanced semantic memory system that provides unlimited conversation storage with intelligent semantic search, going beyond the 5-conversation pinning limit.
Automatic Indexing:
- Conversations with 5+ messages are automatically indexed after 10 minutes of inactivity
- Uses nomic-embed-text (via Ollama) for local, privacy-preserving embeddings
- Stores conversation chunks in ChromaDB vector database for semantic search
- Non-blocking background process ensures no impact on chat performance
Semantic Search:
- Search by meaning, not just keywords - finds conceptually similar conversations
- Relevance scoring shows how well results match your query (e.g., "87% match")
- Automatically injects relevant past conversations into new chats
- Only includes conversations above 60% similarity threshold to avoid noise
Memory Browser (Ctrl+Shift+M):
- Full-text semantic search across all indexed conversations
- Preview snippets with relevance scores
- Remove individual conversations from memory
- Keyboard shortcuts for quick access
Unlimited Storage:
- No artificial limits - store thousands of conversations
- Intelligent chunking (512 tokens with 50-token overlap) preserves context
- Automatic cleanup based on retention policies (configurable)
Smart Context Injection:
- Relevant past conversations automatically added to chat context
- Only the most relevant conversations included (not all indexed conversations)
- Respects context window limits while maximizing relevance
Privacy-First:
- Local embeddings via Ollama (no external API calls)
- All data stored locally in ChromaDB
- No conversation data leaves your machine
Manual Save:
- Click "💾 Save to Memory" button (appears for conversations with 5+ messages)
- Immediately indexes the conversation for future reference
Automatic Indexing:
- Conversations automatically indexed after 5+ messages and 10 minutes idle
- Background service checks every 5 minutes for eligible conversations
- No user action required
Searching Memory:
- Press
Ctrl+Shift+Mto open Memory Browser - Enter your search query (e.g., "Python pandas dataframe")
- View results with relevance scores and preview text
- Click to view or remove conversations
Memory Management:
- Access Settings → Semantic Memory for statistics and controls
- View total indexed conversations and storage usage
- Trigger manual auto-index check
- Clear all memory (with confirmation)
Embedding Model:
- nomic-embed-text (768-dimensional embeddings)
- Runs locally via Ollama
- No API costs or rate limits
Vector Database:
- ChromaDB for persistent vector storage
- Efficient similarity search (<100ms for typical queries)
- Automatic persistence to disk
Context Integration:
- Semantic memory injected after profile context
- Formatted as "Relevant past conversations" section
- Includes conversation title, relevance score, and preview text
| Feature | Pinning (5 max) | Semantic Memory |
|---|---|---|
| Capacity | 5 conversations | Unlimited |
| Selection | Manual | Automatic by relevance |
| Search | None | Semantic search |
| Context Efficiency | All 5 included | Only relevant ones |
| Management | Manual pin/unpin | Auto-indexed |
The Enhanced Heartbeat System provides proactive AI assistance through automated insights and task creation. It analyzes your goals, conversations, project state, and system health to suggest actionable next steps.
Context Gathering:
- Semantic Memory: Conversation history, indexed conversations, storage usage
- Git Repository: Current branch, uncommitted files, unpushed commits, recent commits
- Project Files: TODO/FIXME comments, recently modified files
- System Health: Disk usage, service status
Automated Task Creation:
- Insights can automatically create tracked tasks
- Manual conversion via "Create Task" button
- Tasks linked to insights for tracking
File-Based Configuration (Power Users):
- Define heartbeat tasks in
HEARTBEAT.md - Category-based task organization
- Custom schedules (cron or interval format)
- Two-way sync between file and UI
The heartbeat service runs in the background at configurable intervals (default: every 2 hours). During each heartbeat:
- Context Gathering: Collects goals, conversations, tasks, and enhanced context (memory, git, project, system)
- AI Analysis: Sends context to your chosen LLM (default: Haiku for cost-effectiveness at ~$0.01/user/day)
- Insight Generation: AI generates actionable insights, suggestions, or identifies blockers
- Smart Filtering: Uses "HEARTBEAT_OK" pattern to suppress notifications when everything is on track
- Task Creation: Optionally creates tracked tasks from insights
- Notification: Displays insights in the bell icon panel (top-right header)
1. Enable Heartbeat:
- Settings → Proactive Heartbeat
- Toggle "Enable Heartbeat"
- Choose check interval (30m, 1h, 2h, 4h, 6h)
2. Configure Context Sources:
- Enable Semantic Memory (recommended)
- Enable Git Repository (recommended)
- Optionally enable Project Files and System Health
3. Enable Auto-Create Tasks (Optional):
- Toggle "Auto-Create Tasks"
- Insights will automatically create tracked tasks
4. Set Up Goals:
- Settings → Goals & Priorities
- Enter your goals document (up to 2000 characters)
Create HEARTBEAT.md in your project root:
# Heartbeat Tasks
## Memory Management
Schedule: 0 2 * * *
Enabled: true
Prompt: Review semantic memory usage and suggest cleanup if storage exceeds 100MB
Context: memory
## Testing Reminders
Schedule: 0 9 * * 1
Enabled: true
Prompt: Check test coverage and suggest missing tests
Create_Task: true
Context: git,projectSync Options:
- Load from File: Import tasks from HEARTBEAT.md to database
- Save to File: Export configured tasks to HEARTBEAT.md
1. Check Notifications: Click the bell icon (🔔) in the header
- Badge shows unread insight count
- Dropdown panel displays recent insights
2. Review Insights: Each insight includes:
- Title: Quick summary of the insight
- Description: Detailed suggestion or observation
- Timestamp: When the insight was generated
- Create Task Button: Convert insight to tracked task
3. Dismiss Insights: Click "Dismiss" to remove insights you've addressed
4. Manual Trigger: Force an immediate heartbeat check:
curl -X POST "http://localhost:8000/api/heartbeat/trigger?user_id=default"Progress Check:
Title: "Authentication Feature Progress"
Description: "You mentioned completing the auth feature this week. The last conversation shows you're working on password reset. Consider testing edge cases before marking complete."
Blocker Identification:
Title: "Design Mockup Blocker"
Description: "Your goals mention waiting on design mockups. It's been 3 days since you noted this blocker. Consider reaching out to the design team or creating wireframes yourself."
Task Suggestion:
Title: "API Documentation Reminder"
Description: "You have 5 completed tasks related to API endpoints but no documentation tasks. Consider creating a task to document the new endpoints."
HEARTBEAT_OK (No Notification): When everything is on track, the AI responds with "HEARTBEAT_OK" and no notification is created, reducing noise.
- Default Model: Claude Haiku 4.5 (~$0.25 per 1M input tokens)
- Typical Cost: ~$0.01 per user per day with 2-hour intervals
- Context Size: ~1500 tokens (goals + 3 conversations + 5 tasks)
- Daily Heartbeats: 12 checks/day (2-hour intervals) = ~18K tokens = $0.0045
Cost Optimization:
- Use longer intervals (3-4 hours) to reduce frequency
- Disable during off-hours with active hours settings
- Use local Ollama models (free) instead of hosted providers
- Keep goals document concise
- Local Processing: All context stays within your system
- No External Storage: Insights stored only in your MongoDB
- User Isolation: Each user's goals and insights are completely separate
- Opt-Out Anytime: Disable in settings or delete goals document
No Insights Appearing:
- Check heartbeat is enabled in Settings
- Verify you're within active hours
- Ensure goals document is saved
- Check backend logs for errors
Too Many Notifications:
- Increase interval (e.g., 2h → 4h)
- Adjust active hours to work hours only
- Review goals document - be more specific
Insights Not Relevant:
- Update goals document with current priorities
- Add more context about blockers and progress
- Pin relevant conversations for better context
Get Insights:
GET /api/heartbeat/insights?user_id=default&dismissed=falseDismiss Insight:
POST /api/heartbeat/insights/{insight_id}/dismiss?user_id=defaultGet Configuration:
GET /api/heartbeat/config?user_id=defaultUpdate Configuration:
PUT /api/heartbeat/config?user_id=default
Content-Type: application/json
{
"enabled": true,
"interval": "2h",
"active_hours": {
"start": "09:00",
"end": "18:00",
"timezone": "America/New_York"
}
}Manual Trigger:
POST /api/heartbeat/trigger?user_id=defaultThe Smart Web Search MCP server provides advanced web search capabilities that go far beyond basic search snippets:
- Multi-method extraction: Uses trafilatura (primary) with BeautifulSoup fallback for maximum reliability
- Full webpage content: Extracts complete articles, not just search snippets
- Content cleaning: Removes HTML tags, normalizes whitespace, handles entities
- Structure preservation: Maintains headings, paragraphs, and content hierarchy
- Relevance scoring: Ranks URLs based on query term matches in titles and snippets
- Position weighting: Considers search result position and domain authority
- Smart filtering: Prioritizes documentation, tutorials, and authoritative sources
- Robots.txt compliance: Automatically checks and respects robots.txt files
- Rate limiting: Configurable delays between requests (default: 1 second)
- Proper identification: Uses descriptive User-Agent headers
- Timeout handling: Graceful failure recovery for unreachable sites
Configure extraction behavior via environment variables:
SMART_EXTRACT_MAX_URLS=3 # Maximum URLs to extract (default: 3)
SMART_EXTRACT_MAX_CHARS_PER_URL=2000 # Character limit per webpage (default: 2000)
SMART_EXTRACT_MAX_TOTAL_CHARS=5000 # Total character limit (default: 5000)
SMART_EXTRACT_REQUEST_DELAY=1.0 # Delay between requests (default: 1.0s)- 8x more context: Provides significantly more content than basic search snippets
- High success rate: Multi-method extraction ensures reliable content retrieval
- Memory efficient: Character limits prevent context overflow
- Fast processing: Typically 3-5 seconds for 3 URLs with polite delays
The smart search feature is automatically enabled for all web searches - users get enhanced content extraction without any additional configuration or UI changes.
- Supported providers:
ollama(local),openai,anthropic,google(Gemini),openrouter,groq,sambanova. - Open the Settings modal (header → Settings) to add/validate API keys per provider. Keys are validated with a lightweight request (OpenAI uses a small
max_tokens=16check to avoid false negatives). - The Model Picker shows models by provider. Non‑Ollama models only appear once the provider is configured.
- Provider model naming/prefixes (examples):
- Ollama:
llama3.1(no prefix). - OpenAI:
openai/gpt-5.2. - Anthropic:
anthropic/claude-haiku-4-5. - Google:
gemini/gemini-flash-latest. - OpenRouter:
openrouter/meta-llama/llama-3.3-70b-instruct. - Groq:
groq/llama-3.1-8b-instant. - SambaNova:
sambanova/Meta-Llama-3.1-8B-Instruct.
- Ollama:
API endpoints:
- List providers:
GET /api/providers - Provider models:
GET /api/providers/models - Provider status:
GET /api/providers/{provider_id}/status - Save API key:
POST /api/providers/settings { provider, api_key } - Remove API key:
DELETE /api/providers/{provider_id}/settings - Validate (no key returned):
GET /api/providers/{provider_id}/validate - All models (flat list w/provider):
GET /api/models
Notes:
- Codex MCP requires a valid OpenAI API key. The UI gates the Codex tool if OpenAI is not configured.
Encryption of provider API keys (required):
- The backend encrypts provider API keys (at rest) using a Fernet key set via
SETTINGS_ENCRYPTION_KEYinbackend/.env. - Generate once and keep it stable across restarts; changing it invalidates previously saved keys (you’ll need to re‑enter them).
- Generate and set:
pip install cryptographypython -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"- Set in
backend/.env:SETTINGS_ENCRYPTION_KEY=<output>
Code generation runs in an isolated workspace with a reviewable manifest and artifacts.
- UI: Click the ✨ Tool Selector and choose “Codex (Workspace)”. Enter an instruction (e.g., “Scaffold a Vite + React app”). The assistant inserts an inline
codex_runmessage that shows live status and a brief summary when complete. - Backend: A FastMCP server (
backend/server_codex.py) exposes tools to create workspaces, start/poll/cancel runs, read files, and fetch a manifest. Runs are async with a small in‑memory job tracker. Artifacts write tocodex_artifacts/under the workspace. - Security posture: Dedicated workspace per run, isolated HOME under workspace, symlink refusal, realpath containment checks, optional post‑run output policy (allowed extensions, dotfiles allowlist, max files/bytes, binary heuristics). For strongest isolation, run the service in a container/VM with no network.
- Requirements: Codex CLI available on PATH (or set
CODEX_BIN) and a validOPENAI_API_KEY(via Settings or env) — the key is injected only to the Codex subprocess.
Endpoints:
- Create workspace:
POST /api/codex/workspaces { name_hint, keep } - Start run (auto‑creates workspace if missing):
POST /api/codex/runs { workspace_id, instruction, model?, timeout_seconds? } - Get run status:
GET /api/codex/runs/{run_id} - Cancel run:
POST /api/codex/runs/{run_id}/cancel - Get manifest:
GET /api/codex/workspaces/{workspace_id}/manifest - Read file:
GET /api/codex/workspaces/{workspace_id}/file?relative_path=…
Environment (optional, defaults shown):
CODEX_BIN=codex,CODEX_WORKSPACES_DIR=./.codex_workspaces,CODEX_MAX_CONCURRENCY=1,CODEX_RUN_TTL_HOURS=48CODEX_MAX_PROMPT_CHARS=20000,CODEX_MAX_STDOUT_CHARS=200000,CODEX_MAX_STDERR_CHARS=50000CODEX_MAX_OUTPUT_FILES=500,CODEX_MAX_OUTPUT_TOTAL_BYTES=20971520,CODEX_DEBUG=false
Frontend UX notes:
- Codex runs now render inline as a chat message (
type='codex_run') that persists in the transcript. The old floating run card is removed. - The sidebar shows Codex MCP status; it appears green only when the service is ready and OpenAI is configured.
- A File Viewer Modal exists for future use via the manifest/file APIs; it is not auto‑triggered to avoid dead links.
- The User interacts with the React Frontend.
- The Frontend sends requests (chat messages, conversation management) to the FastAPI Backend API.
- For chat messages, the Backend (
main.py) processes the request:- It may interact with an LLM via the provider layer (Ollama by default; configure others in Settings).
- If the user enables a tool (e.g., web search, SQL query) via the UI:
- The backend prepares the necessary context (e.g., fetching database schema for SQL).
- It may prompt the Ollama model to generate a tool-specific input (e.g., a SQL query or HubSpot JSON payload).
- The backend then communicates with the relevant MCP Service Module (managed as a subprocess: search, SQL, YouTube, HubSpot, Python, Codex).
- The MCP Service Module executes the tool (e.g., calls Serper.dev API, queries MySQL).
- Results from the MCP Service Module are returned to the Backend.
- The Backend may re-prompt Ollama with the tool's results to generate a final, informed response.
- The Backend stores/retrieves conversation history from MongoDB.
- The final response is streamed back to the Frontend and displayed to the user.
- Purpose: Pinned conversations contribute short, LLM‑generated summaries to the system prompt so future chats can reference prior work without sending full transcripts.
- Generation:
- On‑demand, user‑initiated (not heuristic). If you pin a chat that has no summary, the app asks to generate one first and gates pinning until done.
- The backend assembles up to ~1000 words of recent conversation (HTML stripped) and prompts the selected model to produce a single‑paragraph summary capped at ~750 characters.
- Stored summaries are reused; automatic regeneration is not performed.
- Limits & Enforcement:
- Hard cap of 5 pinned conversations. The backend enforces the cap and the UI disables pinning at the limit.
- Pin stats are surfaced in the UI (e.g., “X/5 pinned”).
- Model Selection:
- Choose the model under Settings → Summaries → Select Model. This uses the same provider model catalogs as chat.
- If no model is configured for summaries, generation will fail until you select one.
- Context Assembly:
- Only stored summaries are included in the system prompt for context; heuristic/auto summaries are not used.
- The context service budgets space for summaries alongside your profile prompt.
- Maintenance:
- Admin utilities exist to bulk remove summaries and/or unpin all chats for a clean slate. Use with care in production.
APIs (reference):
- Summary settings:
GET /api/summaries/settings,POST /api/summaries/settings { model_name } - Generate summary:
POST /api/summaries/generate { conversation_id } - Pinning:
POST /api/user-context/pin-conversation { conversation_id, pinned }(cap enforced) - Pin stats:
GET /api/user-context/pin-stats→{ count, max }
- Create schedules from the Tasks panel → Scheduled tab.
- Two modes:
- Recurring: cron expression + timezone. Next run is computed from local wall‑clock in the chosen timezone (DST‑aware) and stored as UTC.
- One‑time: set a date/time and timezone; auto‑disables after the first run.
- You can “Run now” and optionally override the model per run.
See TASKS_USER_GUIDE.md for full details.
- Model Providers: Use the header Model Picker to select a provider/model. Configure provider API keys in Settings; non‑Ollama models only appear once configured.
- Ollama Model: Select from available local models (or set
DEFAULT_OLLAMA_MODEL). Models well-suited to coding may work better with SQL (e.g., codestral). - Search Results: Adjust the number of search results processed in
backend/main.py. - Database Schema Context: Modify
MAX_TABLES_FOR_SCHEMA_CONTEXTinbackend/main.pyto control how many tables' schemas are sent to the LLM. - Prompt Engineering: Modify the system prompts sent to Ollama in
backend/main.pyto tailor responses and SQL generation. - Styling: Customize the frontend appearance by modifying CSS files or styles within the React components in
frontend/src/.
- Backend:
cd backend && uvicorn main:app --reload --port 8000 - Frontend:
cd frontend && npm install && npm run dev - In the app:
- Choose a model (header → Model Picker).
- Optional: open Settings to add OpenAI/other provider keys.
- Use the ✨ Tool Selector to run Smart Web Search, Database, YouTube, Python, HubSpot, or Codex.
- Tasks panel supports ad‑hoc and Scheduled tasks.
Prereqs:
- Ollama running for local models (pull a model, e.g.,
ollama pull llama3.1). - MongoDB reachable (for history/tasks).
- For Codex: install Codex CLI and configure an OpenAI API key (via Settings or env) before starting runs.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.