Agent runtime that lets LLMs think, not just execute.
Most agent frameworks treat LLMs as unreliable workers that need a manager watching every step. They force rigid formats -- ReAct loops, JSON command schemas, mechanical retries -- and dump entire tool catalogs into every prompt. The result is high ceremony, low capability. The framework spends its complexity constraining the LLM instead of releasing it.
Arcana is an operating system for LLM agents, not a pipeline. The LLM decides strategy; the runtime provides services -- budget enforcement, tool dispatch, trace recording, context management. The framework never interprets LLM output: raw facts (TurnFacts) and runtime assessment (TurnAssessment) are kept visibly separate. Eight design principles and four prohibitions, codified in a Constitution, govern every line of code.
pip install arcana-agentimport asyncio
import arcana
async def main():
result = await arcana.run("Summarize this article", api_key="sk-xxx")
print(result.output)
print(f"Cost: ${result.cost_usd:.4f} | Tokens: {result.tokens_used}")
asyncio.run(main())Use a different provider:
import asyncio, arcana
# OpenAI
result = asyncio.run(arcana.run("Hello", provider="openai", api_key="sk-proj-xxx"))
# Anthropic (pip install arcana-agent[anthropic])
result = asyncio.run(arcana.run("Hello", provider="anthropic", model="claude-sonnet-4-20250514", api_key="sk-ant-xxx"))Tools declare when and why, not just how. The LLM reasons about whether to call a tool, not just how.
@arcana.tool(
when_to_use="When you need to do math calculations",
what_to_expect="Returns the numeric result as a string",
failure_meaning="The expression was malformed",
)
def calc(expression: str) -> str:
return str(eval(expression))
result = await arcana.run("What is 15 * 37 + 89?", tools=[calc], api_key="sk-xxx")Create once at startup, use across your entire application. Holds providers, tools, budget, and trace as long-lived resources.
runtime = arcana.Runtime(
providers={"deepseek": "sk-xxx", "openai": "sk-proj-xxx"},
tools=[calc, web_search],
budget=arcana.Budget(max_cost_usd=5.0),
trace=True,
)
result = await runtime.run("Analyze recent trends in quantum computing")Multi-turn sessions with persistent history, shared budget, and context compression.
async with runtime.chat() as c:
r = await c.send("What are the main themes in this dataset?")
r = await c.send("Expand on the second theme")
print(c.total_cost_usd)The LLM can ask clarifying questions mid-execution. If no handler is provided, it proceeds with best judgment -- interaction is a capability, not a dependency.
result = await runtime.run(
"Book a restaurant for dinner",
input_handler=lambda q: input(f"Agent asks: {q}\n> "),
)Return validated Pydantic instances instead of raw text.
from pydantic import BaseModel
class Summary(BaseModel):
title: str
key_points: list[str]
sentiment: str
result = await arcana.run(
"Summarize this article",
response_format=Summary,
api_key="sk-xxx",
)
print(result.parsed.title) # str
print(result.parsed.key_points) # list[str]
result.parsedis alwaysBaseModel | None-- never a raw dict. It isNonewhen noresponse_formatis set or when parsing fails.result.outputcontains the same parsed model when successful, or the raw text when parsing fails.
Pass images alongside text. URLs, local file paths, and data URIs all work.
result = await arcana.run(
"Describe what you see in this image",
images=["https://example.com/photo.jpg"],
provider="openai",
api_key="sk-proj-xxx",
)Sequential steps with optional parallel branches. Each step's output flows as context to the next.
result = await runtime.chain([
arcana.ChainStep(name="research", goal="Find key facts about quantum computing"),
[ # parallel branch
arcana.ChainStep(name="summary", goal="Write a concise summary"),
arcana.ChainStep(name="critique", goal="Identify gaps and biases",
budget=arcana.Budget(max_cost_usd=0.50)),
],
arcana.ChainStep(name="final", goal="Integrate summary and critique into a report"),
])
print(result.steps["final"])Process many independent tasks concurrently with automatic throttling.
results = await runtime.run_batch([
{"goal": "Summarize article 1", "response_format": Summary},
{"goal": "Summarize article 2", "response_format": Summary},
{"goal": "Summarize article 3", "response_format": Summary},
], concurrency=10)
print(f"{results.succeeded}/{len(results.results)} succeeded")
print(f"Total cost: ${results.total_cost_usd:.4f}")Runtime provides communication and budget. Agents decide strategy -- no forced hierarchy. Two modes: "shared" (all agents see everything) and "session" (independent contexts, messages arrive as user messages).
result = await runtime.team(
"Design a landing page for an AI product",
agents=[
arcana.AgentConfig(name="designer", prompt="You are a senior UX designer."),
arcana.AgentConfig(name="copywriter", prompt="You are a conversion copywriter."),
arcana.AgentConfig(name="critic", prompt="You find weaknesses and suggest improvements."),
],
max_rounds=3,
mode="shared", # or "session" for independent contexts
)Isolate budget for a subset of runs without affecting the global runtime budget.
async with runtime.budget_scope(max_cost_usd=0.50) as scoped:
r1 = await scoped.run("Classify this document")
r2 = await scoped.run("Extract key entities")
print(f"Scoped cost: ${scoped.budget_used_usd:.4f}")For workflows that need explicit state machines. Available when you need it, never forced.
from arcana import StateGraph, START, END
graph = runtime.graph(state_schema=MyState)
graph.add_node("research", research_fn)
graph.add_node("write", write_fn)
graph.add_edge(START, "research")
graph.add_edge("research", "write")
graph.add_edge("write", END)
app = graph.compile()
result = await app.ainvoke(initial_state)| LangChain | CrewAI | AutoGPT | Arcana | |
|---|---|---|---|---|
| LLM autonomy | Framework-driven chains | Role-locked agents | Fully autonomous, no guardrails | LLM decides strategy within runtime boundaries |
| Token efficiency | Full context every call | Full prompt per agent | Unbounded context growth | Working-set discipline -- only what this step needs |
| Thinking signals | Ignored | Ignored | Ignored | Runtime listens to thinking for confidence, never constrains |
| Tool management | All tools in every prompt | Per-agent tool sets | All tools always | Dynamic per-turn exposure with affordances |
| User interaction | Not built in | Not built in | Not built in | ask_user built-in, graceful fallback if no handler |
| Pipelines | Chain + glue code | Sequential tasks | No pipelines | chain() with auto context passing + parallel branches |
| Default path | Chain/graph required | Crew required | Agent loop always | Direct answer when possible, agent loop when needed |
DeepSeek | OpenAI | Anthropic | Google Gemini | Kimi (Moonshot) | GLM (Zhipu) | MiniMax | Ollama
All providers use a single OpenAI-compatible adapter. Adding a new provider is one function call.
| Guide | Description |
|---|---|
| Quick Start | Installation through deployment |
| Configuration | Full configuration reference |
| Providers | Provider setup and fallback chains |
| API Reference | Public API documentation |
| Architecture | System design and internals |
| Constitution | Design principles and prohibitions |
| Examples | Runnable code examples |
| Changelog | Release history |
pip install arcana-agent # Core (DeepSeek, OpenAI)
pip install arcana-agent[anthropic] # + Claude support
pip install arcana-agent[gemini] # + Gemini support
pip install arcana-agent[all-providers] # All providers
pip install arcana-agent[ui] # + Trace Web UIOr with uv:
uv add arcana-agent
uv add arcana-agent --extra all-providersRequires Python 3.11+.
MIT