A comprehensive, modular C++ library for implementing and experimenting with Reinforcement Learning (RL) algorithms. Built with clean architecture principles, separating core mathematics, RL algorithms, and simulation environments.
RL-Library/
├── Core/
│ ├── MathUtils.h # Mathematical foundations (Unit 1)
│ └── MathUtils.cpp # Implementation
├── Algorithms/
│ ├── QLearning.h # Q-Learning algorithm
│ ├── QLearning.cpp # Q-Learning implementation
│ ├── SARSA.h # SARSA algorithm
│ └── SARSA.cpp # SARSA implementation
├── Env/
│ ├── GridWorld.h # GridWorld environment
│ └── GridWorld.cpp # GridWorld implementation
├── RL_Lib.h # Main library header
├── main.cpp # Driver/example code
├── CMakeLists.txt # Build configuration
└── README.md # This file
Implements foundational mathematical operations for machine learning:
-
Activation Functions
- Sigmoid:
σ(x) = 1 / (1 + e^(-x)) - Tanh:
tanh(x) - ReLU:
max(0, x)
- Sigmoid:
-
Evaluation Metrics
- Mean Squared Error (MSE):
Σ(target - pred)² / n - Mean Absolute Error (MAE):
Σ|target - pred| / n
- Mean Squared Error (MSE):
-
Vector Operations
- Dot product
- Vector addition
- Scalar multiplication
-
Matrix Operations
- Matrix multiplication
- Matrix transpose
- Matrix-vector multiplication
An off-policy temporal difference algorithm that learns the optimal action-value function.
Update Rule:
Q(s,a) ← Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]
Key Parameters:
α (alpha): Learning rate (0.1)γ (gamma): Discount factor (0.99)ε (epsilon): Exploration rate (0.1) - Epsilon-greedy strategy
Features:
- Epsilon-greedy action selection
- Q-table management
- Automatic state initialization
An on-policy temporal difference algorithm that learns the value of the policy being followed.
Update Rule:
Q(s,a) ← Q(s,a) + α[r + γ·Q(s',a') - Q(s,a)]
Key Difference from Q-Learning:
- Uses the actual next action taken (a') instead of the max
- More conservative (on-policy) learning
A simple grid-based environment for RL agents to navigate.
Features:
- Configurable grid dimensions (default: 5×5)
- Agent starts at (0, 0)
- Customizable goal position
- 4 actions: UP, DOWN, LEFT, RIGHT
- Reward: +1.0 for reaching goal, -0.01 per step
- Boundary constraints (agent cannot leave grid)
State Representation:
- Linear state index:
state = y * width + x
- C++17 or higher
- CMake 3.10+
- Standard C++ compiler (g++, clang, MSVC)
# Clone the repository
git clone https://github.com/manjushwarkhairkar/ML-Library.git
cd ML-Library
# Create build directory
mkdir build
cd build
# Configure and build
cmake ..
make
# Run the program
./rl_main========================================
RL Library v1.0.0
========================================
[1] Initializing GridWorld Environment...
Grid Size: 5x5
Total States: 25
Goal Position: (4, 4)
[2] Initializing Q-Learning Agent...
Learning Rate (alpha): 0.1
Discount Factor (gamma): 0.99
Exploration Rate (epsilon): 0.1
[3] Starting Training...
Episode 10 | Reward: -0.38
Episode 20 | Reward: -0.23
...
Episode 100 | Reward: 0.89
[4] Training Complete!
========================================
#include "RL_Lib.h"
using namespace RLLib;
int main() {
// Initialize environment
GridWorld env(5, 5, 4, 4);
// Initialize agent
QLearner agent(0.1, 0.99, 0.1);
// Training loop
for (int episode = 0; episode < 100; ++episode) {
env.reset();
int state = env.getState();
for (int step = 0; step < 50; ++step) {
int action = agent.chooseAction(state, GridWorld::NUM_ACTIONS);
double reward = env.step(action);
int next_state = env.getState();
agent.update(state, action, reward, next_state, GridWorld::NUM_ACTIONS);
state = next_state;
if (env.isTerminal()) break;
}
}
return 0;
}#include "Core/MathUtils.h"
using namespace RLLib;
int main() {
// Vector operations
std::vector<double> a = {1.0, 2.0, 3.0};
std::vector<double> b = {4.0, 5.0, 6.0};
double dot = dot_product(a, b);
auto sum = vector_add(a, b);
// Activation functions
double sig = sigmoid(0.5);
double act = relu(-2.0);
// Matrix operations
Matrix m1(3, 2);
Matrix m2(2, 3);
Matrix result = matrix_multiply(m1, m2);
return 0;
}// Constructor
QLearner(double alpha = 0.1, double gamma = 0.9, double epsilon = 0.1);
// Initialize state in Q-table
void initializeState(int state, int numActions);
// Select action using epsilon-greedy strategy
int chooseAction(int state, int numActions);
// Update Q-value using Q-Learning rule
void update(int state, int action, double reward, int nextState, int numActions);
// Get Q-value for state-action pair
double getQValue(int state, int action);
// Get maximum Q-value for a state
double getMaxQValue(int state);// Constructor
GridWorld(int width = 5, int height = 5, int goalX = 4, int goalY = 4);
// Get current state index
int getState() const;
// Reset environment
void reset();
// Execute action, return reward
double step(int action);
// Check if at goal
bool isTerminal() const;
// Get positions
std::pair<int, int> getAgentPosition() const;
std::pair<int, int> getGoalPosition() const;// Activation functions
double sigmoid(double x);
double tanh_activation(double x);
double relu(double x);
// Evaluation metrics
double calculate_mse(const std::vector<double>& target, const std::vector<double>& pred);
double calculate_mae(const std::vector<double>& target, const std::vector<double>& pred);
// Vector operations
double dot_product(const std::vector<double>& a, const std::vector<double>& b);
std::vector<double> vector_add(const std::vector<double>& a, const std::vector<double>& b);
std::vector<double> scalar_multiply(const std::vector<double>& a, double scalar);
// Matrix operations
Matrix matrix_multiply(const Matrix& a, const Matrix& b);
Matrix matrix_transpose(const Matrix& a);
std::vector<double> matrix_vector_multiply(const Matrix& mat, const std::vector<double>& vec);- Time Complexity: O(1) average for Q-Learning updates
- Space Complexity: O(n*m) for n states and m actions
- Scalability: Efficient for tabular methods; consider function approximation for larger state spaces
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Sutton & Barto: "Reinforcement Learning: An Introduction" (2nd Edition)
- Bellman Equation for MDPs
- Temporal Difference Learning
- Policy Gradient Methods (REINFORCE, Actor-Critic)
- Deep Q-Networks (DQN)
- Experience Replay and Target Networks
- Multiple environments (CartPole, Mountain Car)
- Visualization tools
- GPU acceleration support
- Python bindings
Version: 1.0.0
Last Updated: 2024
Maintainer: Manjushwar Khairkar