🐙 octopus

🎯 Mission Statement

A generic, multi-threaded, and ergonomic Rust crate for Multi-Armed Bandit (MAB) algorithms—designed for engineers who need extensibility, custom reward modeling, and a clear path to contextual bandits.

✅ Goals

Support key MAB algorithms:
- MVP: Epsilon-Greedy, UCB1, Thompson Sampling
- Future: LinUCB, etc.
Built-in multi-threading for computational bottlenecks (e.g., regret calculation, reward evaluation).
Clean, extensible abstraction for stateless and contextual bandits, using Rust traits and generics.
User-defined reward types and flexible action/context typing via trait bounds.
Target audience: ML engineers & backend engineers running real-time services in Rust.

🚫 Non-Goals

Not a general-purpose RL framework (e.g., no DQN, A3C, etc.)
Not a rich visualization tool

🧱 Core Abstractions

// Action: Represents an arm/action in the bandit problem
pub trait Action: Clone + Eq + Hash + Send + Sync + 'static {
    type ValueType;
    fn id(&self) -> usize;
    fn name(&self) -> String { ... }
    fn value(&self) -> Self::ValueType;
}

// Reward: Represents the reward signal
pub trait Reward: Clone + Send + Sync + 'static {
    fn value(&self) -> f64;
}

// Context: Represents contextual information (for contextual bandits)
pub trait Context: Clone + Send + Sync + 'static {
    type DimType: ndarray::Dimension;
    fn to_ndarray(&self) -> ndarray::Array<f64, Self::DimType>;
}

// BanditPolicy: Core trait for all bandit algorithms
pub trait BanditPolicy<A, R, C>: Send + Sync + 'static
where
    A: Action,
    R: Reward,
    C: Context,
{
    fn choose_action(&self, context: &C) -> A;
    fn update(&mut self, context: &C, action: &A, reward: &R);
    fn reset(&mut self);
}

// Environment: Simulated environment for running experiments
pub trait Environment<A, R, C>: Send + Sync + 'static { ... }

Extensibility: Implement these traits for your own types to plug into the framework.
Action/Reward/Context: Highly generic, must satisfy trait bounds above.

🧪 Implemented Algorithms

`epsilon_greedy::EpsilonGreedyPolicy`

Parameters: epsilon: f64, initial actions
Tracks average reward and count per action
Generic over action, reward, and context types

🏗️ Simulation Engine

The Simulator struct orchestrates the interaction between a bandit policy and an environment.
Collects cumulative rewards, regret, and per-step metrics via SimulationResults.

Example:

use octopus::algorithms::epsilon_greedy::EpsilonGreedyPolicy;
use octopus::simulation::simulator::Simulator;
use octopus::traits::entities::{Action, Reward, Context, DummyContext};

// Define your own Action, Reward, and Environment types implementing the required traits
// ...

let actions = vec![/* your actions here */];
let mut policy = EpsilonGreedyPolicy::new(0.1, &actions).unwrap();
let environment = /* your environment here */;
let mut simulator = Simulator::new(policy, environment);
let results = simulator.run(1000, &actions);
println!("Cumulative reward: {}", results.cumulative_reward);

🔁 Concurrency Design

Internal parallelism (e.g., for regret calculation) uses rayon and thread-safe primitives (Mutex).
User-facing API is single-threaded for simplicity; internal operations are parallelized where beneficial.
No explicit builder pattern or thread-safe wrappers in the current API.

📦 Integration & Ecosystem

Uses ndarray for context features
Uses rayon for parallelism
Error handling via thiserror
(Planned) Optional serde for serialization

🧩 Future Roadmap

Add more algorithms: UCB1, LinUCB etc.
Benchmark suite comparing with Python implementations
Async/streaming reward update support
Optional logging/tracing integration

🧪 Test Strategy

Unit tests for all algorithms and simulation logic
Integration tests with simulated reward distributions
Stress tests for multi-threaded scenarios

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐙 octopus

🎯 Mission Statement

✅ Goals

🚫 Non-Goals

🧱 Core Abstractions

🧪 Implemented Algorithms

`epsilon_greedy::EpsilonGreedyPolicy`

🏗️ Simulation Engine

🔁 Concurrency Design

📦 Integration & Ecosystem

🧩 Future Roadmap

🧪 Test Strategy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐙 octopus

🎯 Mission Statement

✅ Goals

🚫 Non-Goals

🧱 Core Abstractions

🧪 Implemented Algorithms

epsilon_greedy::EpsilonGreedyPolicy

🏗️ Simulation Engine

🔁 Concurrency Design

📦 Integration & Ecosystem

🧩 Future Roadmap

🧪 Test Strategy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`epsilon_greedy::EpsilonGreedyPolicy`

Packages