GitHub - AIDC-AI/Pixelle-Studio: Your Zero-Code AI File Expert

🚀 Define Workflows in Natural Language — Your Zero-Code AI File Expert

📄 Full-Format Docs 🧠 Natural-Language SOPs 🔌 One-Click Tools 🛡️ Rock-Solid Stability ⚡ Token-Smart

English | 中文

Pixelle Studio is an open-source AI Agent workspace built for expert-level file processing. Simply describe your workflow in natural language to turn daily task SOPs into AI-executable Skills — no complex variable passing or coding knowledge required. Plug in external tools via MCP, and let the Agent reliably generate PDFs, PPTs, Excel files, and more — with three-layer failover for rock-solid stability and progressive loading to save tokens.

💡 Not just a chatbot — teach AI your workflow in plain language, zero-code your way to an expert file processing agent.

✨ Key Highlights

📄 Expert File Processing PDF, Excel, PPT, Word, Markdown, HTML — Generate, preview & download any format	🧠 Natural-Language SOP Skills Describe workflows in plain language, zero-code — Turn anyone into an Agent expert, no dev skills needed	🔌 One-Click Tool Integration Plug in search, maps, video & more via MCP — Extend your Agent's reach in seconds
🛡️ Rock-Solid Stability Three-layer failover + WebSocket heartbeat — Service never stops, long tasks never drop	⚡ Token-Smart Engine Progressive skill loading saves 90%+ tokens — Smart context compression, never overflow	🖥️ Secure Persistent Execution Built-in PTY terminal with smart security filtering — Multi-step code execution, variable persistence

🎯 Why Pixelle Studio?

Feature	Traditional AI Chat Tools	Pixelle Studio
Document Generation (PDF/PPT/Excel)	❌ Text-only output	✅ Generate files with live preview
Code Execution	❌ Or plugin-dependent	✅ Built-in persistent terminal, multi-step
Custom Skills	❌	✅ Natural language to define Skills, zero-code
External Tools (MCP)	❌ Closed ecosystem	✅ Open protocol, plug & play
Context Management	❌ Passive truncation	✅ Smart compression with rich history reconstruction
Multi-Model Failover	❌ Single model	✅ Three-layer failover + WebSocket heartbeat
Token Consumption	🔴 Full context loading	🟢 Progressive on-demand loading

🖼️ Use Cases

1️⃣ Road Trip Planning

💬 "I'd like to drive from Seattle Airport to Mount Rainier, could you please help me with my itinerary? I'm leaving tomorrow morning."

Agent loads map skills → calls map API → plans the route → generates itinerary document with live preview

Fully bilingual — the same natural language experience works seamlessly in Chinese too 👇

2️⃣ Deep Research + PPT Generation

💬 "Design a professional PPT presentation about "2025 AI Technology Trends" with a cover page, table of contents, 3 content slides with charts, and a summary page. Use modern, clean design with professional color scheme."

Agent combines multiple Skills → web search → content scraping → structured analysis → auto-generates PPT

3️⃣ Excel Data Analysis & Report Generation

💬 "I have a yearly sales dataset. Summarize revenue by product line per quarter, calculate YoY growth rates, highlight anomalies in red, and generate a financial analysis report."

Agent reads raw Excel → data cleaning & structuring → uses native Excel formulas (SUM/VLOOKUP/growth rate formulas, not Python-hardcoded values) for summaries → conditional formatting to flag anomalies → recalc.py verifies zero formula errors → outputs a professional financial report

Why more accurate? Traditional AI tools calculate numbers in Python and paste them into cells — change the data, and everything breaks. Pixelle Studio insists on native Excel formula-driven output, producing "living" spreadsheets — edit the source data, and all summaries, growth rates, and charts update automatically.

4️⃣ HTML Games & Interactive Content

💬 "Build me a Snake game"

Agent writes HTML/CSS/JS → generates a runnable game file → built-in preview for instant play

5️⃣ Create Your Own Skills

Don't just use built-in skills — create your own to teach the Agent your unique workflows:

🏗️ Architecture

                          ┌──────────────────────────┐
                          │     Frontend (Next.js)   │
                          │  Chat + Skills + Preview │
                          └────────────┬─────────────┘
                                       │ WebSocket
                          ┌────────────▼─────────────┐
                          │    Backend (FastAPI)     │
                          │      SkillAgent Core     │
                          └────────────┬─────────────┘
                                       │
             ┌────────────┬────────────┼────────────┬────────────┐
             ▼            ▼            ▼            ▼            ▼
      ┌─────────────┐┌──────────┐┌───────────┐┌───────────┐┌──────────┐
      │  9 Built-in ││    PTY   ││  Skill    ││   MCP     ││ Context  │
      │  Tools      ││ Terminal ││  System   ││   Tools   ││ Manager  │
      │ read/write  ││Persistent││Progressive││ External  ││  Auto    │
      │ exec/shell  ││ Sessions ││ Loading   ││Integration││Compaction│
      └─────────────┘└──────────┘└───────────┘└───────────┘└──────────┘

Technical Deep Dive

🔋 Progressive Skill Loading — The Secret to Token Efficiency

Unlike traditional approaches that stuff all skills into the System Prompt, we use three-level progressive loading:

Level	Content	Token Cost	When Loaded
Level 1	Skill metadata (name + description)	~50 tokens/skill	Every conversation
Level 2	Full SKILL.md documentation	~500-2000 tokens	On-demand by Agent
Level 3	Auxiliary files (scripts/references)	Variable	On-demand by Agent

Result: 12 built-in Skills consume only ~600 tokens of metadata, while traditional approaches might require 20,000+ tokens.

🖥️ Persistent Pseudo-Terminal (PTY) — Beyond Code Execution

Built on pexpect, our persistent Shell Sessions go beyond one-shot code execution:

# Variables persist across multiple calls!
shell_exec("import pandas as pd", shell_type="python")
shell_exec("df = pd.DataFrame({'a': [1,2,3]})", shell_type="python")
shell_exec("print(df.describe())", shell_type="python")  # df still exists!

Advantages:

✅ Variable Persistence — State maintained across calls
✅ Multi-language — Bash / Python / IPython
✅ Smart Security — Blocks dangerous commands while allowing legitimate patterns (e.g. python -c "stmt1; stmt2")
✅ Auto-recovery — Automatic restart on session crash
✅ Auto-cleanup — Idle sessions automatically recycled

🛡️ Three-Layer Failover + WebSocket Heartbeat — Service That Never Stops

Request failed?
  ├─ Layer 1: Auth Failover     → Switch API Key / Base URL
  ├─ Layer 2: Model Failover    → Switch to fallback model (gpt-4o → gpt-4o-mini → ...)
  └─ Layer 3: Thinking Failover → Downgrade thinking depth

Long-running task?
  └─ WebSocket Heartbeat        → Periodic pings keep the connection alive

Even if the primary model faces rate limits, timeouts, or quota exhaustion, the system automatically switches to backup plans. For long-running tasks like PPT generation, WebSocket heartbeat keeps the connection alive — no more "no signal" drops.

📐 Smart Context Management — Never Overflow

Context Window Guard — Real-time token usage monitoring with automatic threshold alerts
Auto-Compaction — When context reaches ~70% usage, automatically generates a summary to compress history
Rich History Reconstruction — Multi-turn conversations retain tool calls, code execution, and file outputs for coherent context
Multi-model Aware — Auto-detects model context window sizes (GPT-4o 128K / Claude 200K / Gemini 1M)

🔌 MCP External Tool Integration

Seamlessly connect external tools via the Model Context Protocol open standard:

Built-in Skills already support these MCP tools:

Tool	Function	Use Case
🔍 Exa Search	AI-native search engine	Deep research, info gathering
🔍 Bing Search	General web search	Real-time information queries
🗺️ AMap (Gaode)	Route planning, POI search	Travel planning
🌐 Web Fetch	Web content scraping	Data collection
🎬 Social Media Video	Video content parsing	Content creation

You can also integrate any MCP-compatible tool service!

🚀 Quick Start

Prerequisites

Python 3.10+ and uv (Python package manager)
Node.js 20+ and npm
Docker & Docker Compose (optional, for containerized deployment)

Option 1: One-Command Start

# 1. Clone the repo
git clone https://github.com/AIDC-AI/Pixelle-Studio.git
cd Pixelle-Studio

# 2. One-command start (auto-installs dependencies on first run)
./start.sh

💡 After starting, open http://localhost:3000, click the ⚙️ Settings button in the top-right corner to configure your API Key, Base URL, and Model.

Option 2: Docker Compose (Recommended for Deployment)

# 1. Clone the repo
git clone https://github.com/AIDC-AI/Pixelle-Studio.git
cd Pixelle-Studio

# 2. Build and start all services
docker compose up -d

# 3. View logs (optional)
docker compose logs -f

Then visit 👉 http://localhost:3000 and configure your API Key in ⚙️ Settings.

📦 Docker Compose Commands Reference

docker compose up -d            # Start all services in background
docker compose up               # Start in foreground (see logs directly)
docker compose down             # Stop all services
docker compose logs -f          # Follow all logs
docker compose logs -f backend  # Follow backend logs only
docker compose up --build       # Rebuild images and start
docker compose ps               # Show running services status

Data Persistence: The following data is persisted through Docker volumes:

backend-data — SQLite database
backend-scripts — Generated files (PDF/PPT/Excel/HTML etc.)
backend-skills — User-defined skills
backend-logs — Application logs

To reset all data: docker compose down -v

Option 3: Manual Start

# Backend
cd backend
uv sync             # Install Python dependencies (creates .venv automatically)
npm install         # Install Node.js dependencies (for PPT/document generation skills)
.venv/bin/python3 -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8001

# Frontend (new terminal)
cd frontend
npm install          # Install Node.js dependencies
npm run dev          # Start dev server (port 3000)

Then visit 👉 http://localhost:3000 and configure your API Key in ⚙️ Settings.

Configuration

LLM Settings (API Key, Base URL, Model) are configured per-user through the ⚙️ Settings panel in the web UI — no environment files needed.

Infrastructure variables (only needed for Docker or custom deployments):

Variable	Description	Default
`FRONTEND_PORT`	Frontend port	`3000`
`BACKEND_PORT`	Backend port	`8001`
`NEXT_PUBLIC_API_BASE`	Frontend → Backend API URL	`http://localhost:8001/api`
`NEXT_PUBLIC_WS_BASE`	Frontend → Backend WebSocket URL	`ws://localhost:8001/ws`
`JWT_SECRET`	JWT signing secret	Auto-generated

🛠️ Built-in Tools

Pixelle Studio provides the Agent with 9 ready-to-use tools:

Tool	Function	Description
`shell_exec`	Persistent terminal	Multi-step execution with variable persistence & smart security
`exec`	Command execution	One-shot commands / background tasks
`read_file`	Read files	Supports skill files and user files
`write_file`	Write files	Create scripts / config files
`edit_file`	Edit files	Precise string replacement
`grep`	Search content	Regex support
`find`	Find files	Glob pattern matching
`ls`	List directory	Smart limiting to prevent overflow
`process`	Process management	Monitor / terminate background tasks

📁 Project Structure

Pixelle-Studio/
├── frontend/                  # Frontend (Next.js 16 + React 19)
│   ├── app/                   # App Router pages
│   ├── components/            # UI Components
│   │   ├── layout/chat/       # Chat interface
│   │   ├── layout/leftPanel/  # Sidebar (Sessions + Skills)
│   │   └── ui/                # Shared UI components
│   ├── hooks/                 # React Hooks
│   ├── lib/                   # API clients
│   ├── types/                 # TypeScript type definitions
│   └── Dockerfile             # Frontend container image
│
├── backend/                   # Backend (Python + FastAPI)
│   ├── app/
│   │   ├── agent.py           # SkillAgent core engine
│   │   ├── tools/             # 9 built-in tools
│   │   ├── context/           # Context management (Guard + Compaction)
│   │   ├── skills/            # Skill loader
│   │   ├── config/            # Failover configuration
│   │   └── routes/            # REST API routes
│   ├── skills/                # Skills library
│   │   ├── default/           # Built-in skills (PDF/PPT/Excel/Search...)
│   │   └── <user_id>/         # User-defined skills
│   ├── scripts/               # Generated file storage
│   └── Dockerfile             # Backend container image
│
├── docker-compose.yml         # Docker Compose orchestration
├── assets/                    # README assets
└── start.sh                   # One-command start script

🧰 Tech Stack

Backend: Python 3.10+ · FastAPI · OpenAI API · WebSocket · SQLAlchemy · pexpect

Frontend: Next.js 16 · React 19 · TypeScript · Tailwind CSS

Infrastructure: SQLite · MCP Protocol · Docker Compose

🤝 Contributing

We welcome all contributions! Whether it's bug reports, feature suggestions, or code submissions.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the Apache License 2.0.

⭐ If this project helps you, please give us a Star!

GitHub · Report Bug · Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
assets		assets
backend		backend
frontend		frontend
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
README_CN.md		README_CN.md
docker-compose.yml		docker-compose.yml
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

✨ Key Highlights

📄 Expert File Processing

🧠 Natural-Language SOP Skills

🔌 One-Click Tool Integration

🛡️ Rock-Solid Stability

⚡ Token-Smart Engine

🖥️ Secure Persistent Execution

🎯 Why Pixelle Studio?

🖼️ Use Cases

1️⃣ Road Trip Planning

2️⃣ Deep Research + PPT Generation

3️⃣ Excel Data Analysis & Report Generation

4️⃣ HTML Games & Interactive Content

5️⃣ Create Your Own Skills

🏗️ Architecture

Technical Deep Dive

🔌 MCP External Tool Integration

🚀 Quick Start

Prerequisites

Option 1: One-Command Start

Option 2: Docker Compose (Recommended for Deployment)

Option 3: Manual Start

Configuration

🛠️ Built-in Tools

📁 Project Structure

🧰 Tech Stack

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages