Name	Name	Last commit message	Last commit date
parent directory ..
src	src
README.md	README.md
example.py	example.py
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt

Cactus Python Package

Python bindings for Cactus Engine via FFI. Auto-installed when you run source ./setup.

Getting Started

# Setup environment
source ./setup

# Build shared library for Python
cactus build --python

# Download models
cactus download LiquidAI/LFM2-VL-450M
cactus download openai/whisper-small

# Optional: set your Cactus Cloud API key for automatic cloud fallback
cactus auth

Quick Example

from cactus import cactus_init, cactus_complete, cactus_destroy
import json

model = cactus_init("weights/lfm2-vl-450m")

messages = [{"role": "user", "content": "What is 2+2?"}]
response = json.loads(cactus_complete(model, messages))
print(response["response"])

cactus_destroy(model)

API Reference

`cactus_init(model_path, corpus_dir=None)`

Initialize a model and return its handle.

Parameter	Type	Description
`model_path`	`str`	Path to model weights directory
`corpus_dir`	`str`	Optional path to RAG corpus directory for document Q&A

model = cactus_init("weights/lfm2-vl-450m")
rag_model = cactus_init("weights/lfm2-rag", corpus_dir="./documents")

`cactus_complete(model, messages, **options)`

Run chat completion. Returns JSON string with response and metrics.

Parameter	Type	Description
`model`	handle	Model handle from `cactus_init`
`messages`	`list\|str`	List of message dicts or JSON string
`tools`	`list`	Optional tool definitions for function calling
`temperature`	`float`	Sampling temperature
`top_p`	`float`	Top-p sampling
`top_k`	`int`	Top-k sampling
`max_tokens`	`int`	Maximum tokens to generate
`stop_sequences`	`list`	Stop sequences
`include_stop_sequences`	`bool`	Include matched stop sequences in output (default: `False`)
`force_tools`	`bool`	Constrain output to tool call format
`tool_rag_top_k`	`int`	Select top-k relevant tools via Tool RAG (default: 2, 0 = use all tools)
`confidence_threshold`	`float`	Minimum confidence for local generation (default: 0.7, triggers cloud_handoff when below)
`callback`	`fn`	Streaming callback `fn(token, token_id, user_data)`

# Basic completion
messages = [{"role": "user", "content": "Hello!"}]
response = cactus_complete(model, messages, max_tokens=100)
print(json.loads(response)["response"])

# With tools
tools = [{
    "name": "get_weather",
    "description": "Get weather for a location",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
    }
}]
response = cactus_complete(model, messages, tools=tools)

# Streaming
def on_token(token, token_id, user_data):
    print(token, end="", flush=True)

cactus_complete(model, messages, callback=on_token)

Response format (all fields always present):

{
    "success": true,
    "error": null,
    "cloud_handoff": false,
    "response": "Hello! How can I help?",
    "function_calls": [],
    "confidence": 0.85,
    "time_to_first_token_ms": 45.2,
    "total_time_ms": 163.7,
    "prefill_tps": 619.5,
    "decode_tps": 168.4,
    "ram_usage_mb": 245.67,
    "prefill_tokens": 28,
    "decode_tokens": 50,
    "total_tokens": 78
}

Cloud handoff response (when model detects low confidence):

{
    "success": false,
    "error": null,
    "cloud_handoff": true,
    "response": null,
    "function_calls": [],
    "confidence": 0.18,
    "time_to_first_token_ms": 45.2,
    "total_time_ms": 45.2,
    "prefill_tps": 619.5,
    "decode_tps": 0.0,
    "ram_usage_mb": 245.67,
    "prefill_tokens": 28,
    "decode_tokens": 0,
    "total_tokens": 28
}

When cloud_handoff is True, the model's confidence dropped below confidence_threshold (default: 0.7) and recommends deferring to a cloud-based model for better results. Handle this in your application:

result = json.loads(cactus_complete(model, messages))
if result["cloud_handoff"]:
    # Defer to cloud API (e.g., OpenAI, Anthropic)
    response = call_cloud_api(messages)
else:
    response = result["response"]

`cactus_transcribe(model, audio_path, prompt="")`

Transcribe audio using a Whisper model. Returns JSON string.

Parameter	Type	Description
`model`	handle	Whisper model handle
`audio_path`	`str`	Path to audio file (WAV)
`prompt`	`str`	Whisper prompt for language/task

whisper = cactus_init("weights/whisper-small")
prompt = "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>"
response = cactus_transcribe(whisper, "audio.wav", prompt=prompt)
print(json.loads(response)["response"])
cactus_destroy(whisper)

`cactus_embed(model, text, normalize=False)`

Get text embeddings. Returns list of floats.

Parameter	Type	Description
`model`	handle	Model handle
`text`	`str`	Text to embed
`normalize`	`bool`	L2-normalize embeddings (default: False)

embedding = cactus_embed(model, "Hello world")
print(f"Dimension: {len(embedding)}")

`cactus_image_embed(model, image_path)`

Get image embeddings from a VLM. Returns list of floats.

embedding = cactus_image_embed(model, "image.png")

`cactus_audio_embed(model, audio_path)`

Get audio embeddings from a Whisper model. Returns list of floats.

embedding = cactus_audio_embed(whisper, "audio.wav")

`cactus_reset(model)`

Reset model state (clear KV cache). Call between unrelated conversations.

cactus_reset(model)

`cactus_stop(model)`

Stop an ongoing generation (useful with streaming callbacks).

cactus_stop(model)

`cactus_destroy(model)`

Free model memory. Always call when done.

cactus_destroy(model)

`cactus_get_last_error()`

Get the last error message, or None if no error.

error = cactus_get_last_error()
if error:
    print(f"Error: {error}")

`cactus_tokenize(model, text)`

Tokenize text. Returns list of token IDs.

tokens = cactus_tokenize(model, "Hello world")
print(tokens)  # [1234, 5678, ...]

`cactus_rag_query(model, query, top_k=5)`

Query RAG corpus for relevant text chunks. Requires model initialized with corpus_dir.

Parameter	Type	Description
`model`	handle	Model handle (must have corpus_dir set)
`query`	`str`	Query text
`top_k`	`int`	Number of chunks to retrieve (default: 5)

model = cactus_init("weights/lfm2-rag", corpus_dir="./documents")
chunks = cactus_rag_query(model, "What is machine learning?", top_k=3)
for chunk in chunks:
    print(f"Score: {chunk['score']:.2f} - {chunk['text'][:100]}...")

Vision (VLM)

Pass images in the messages for vision-language models:

vlm = cactus_init("weights/lfm2-vl-450m")

messages = [{
    "role": "user",
    "content": "Describe this image",
    "images": ["path/to/image.png"]
}]
response = cactus_complete(vlm, messages)
print(json.loads(response)["response"])

Full Example

See python/example.py for a complete example covering:

Text completion
Text/image/audio embeddings
Vision (VLM)
Speech transcription

python python/example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Cactus Python Package

Getting Started

Quick Example

API Reference

`cactus_init(model_path, corpus_dir=None)`

`cactus_complete(model, messages, **options)`

`cactus_transcribe(model, audio_path, prompt="")`

`cactus_embed(model, text, normalize=False)`

`cactus_image_embed(model, image_path)`

`cactus_audio_embed(model, audio_path)`

`cactus_reset(model)`

`cactus_stop(model)`

`cactus_destroy(model)`

`cactus_get_last_error()`

`cactus_tokenize(model, text)`

`cactus_rag_query(model, query, top_k=5)`

Vision (VLM)

Full Example

FilesExpand file tree

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

Cactus Python Package

Getting Started

Quick Example

API Reference

cactus_init(model_path, corpus_dir=None)

cactus_complete(model, messages, **options)

cactus_transcribe(model, audio_path, prompt="")

cactus_embed(model, text, normalize=False)

cactus_image_embed(model, image_path)

cactus_audio_embed(model, audio_path)

cactus_reset(model)

cactus_stop(model)

cactus_destroy(model)

cactus_get_last_error()

cactus_tokenize(model, text)

cactus_rag_query(model, query, top_k=5)

Vision (VLM)

Full Example

`cactus_init(model_path, corpus_dir=None)`

`cactus_complete(model, messages, **options)`

`cactus_transcribe(model, audio_path, prompt="")`

`cactus_embed(model, text, normalize=False)`

`cactus_image_embed(model, image_path)`

`cactus_audio_embed(model, audio_path)`

`cactus_reset(model)`

`cactus_stop(model)`

`cactus_destroy(model)`

`cactus_get_last_error()`

`cactus_tokenize(model, text)`

`cactus_rag_query(model, query, top_k=5)`