Skip to content

Manjushwarofficial/RL-Library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL Library - A Professional C++ Reinforcement Learning Framework

A comprehensive, modular C++ library for implementing and experimenting with Reinforcement Learning (RL) algorithms. Built with clean architecture principles, separating core mathematics, RL algorithms, and simulation environments.

Project Structure

RL-Library/
├── Core/
│   ├── MathUtils.h          # Mathematical foundations (Unit 1)
│   └── MathUtils.cpp        # Implementation
├── Algorithms/
│   ├── QLearning.h          # Q-Learning algorithm
│   ├── QLearning.cpp        # Q-Learning implementation
│   ├── SARSA.h              # SARSA algorithm
│   └── SARSA.cpp            # SARSA implementation
├── Env/
│   ├── GridWorld.h          # GridWorld environment
│   └── GridWorld.cpp        # GridWorld implementation
├── RL_Lib.h                 # Main library header
├── main.cpp                 # Driver/example code
├── CMakeLists.txt           # Build configuration
└── README.md                # This file

Features

1. Core Mathematics Module (Core/MathUtils.h)

Implements foundational mathematical operations for machine learning:

  • Activation Functions

    • Sigmoid: σ(x) = 1 / (1 + e^(-x))
    • Tanh: tanh(x)
    • ReLU: max(0, x)
  • Evaluation Metrics

    • Mean Squared Error (MSE): Σ(target - pred)² / n
    • Mean Absolute Error (MAE): Σ|target - pred| / n
  • Vector Operations

    • Dot product
    • Vector addition
    • Scalar multiplication
  • Matrix Operations

    • Matrix multiplication
    • Matrix transpose
    • Matrix-vector multiplication

2. Algorithms Module (Algorithms/)

Q-Learning (QLearning.h/cpp)

An off-policy temporal difference algorithm that learns the optimal action-value function.

Update Rule:

Q(s,a) ← Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]

Key Parameters:

  • α (alpha): Learning rate (0.1)
  • γ (gamma): Discount factor (0.99)
  • ε (epsilon): Exploration rate (0.1) - Epsilon-greedy strategy

Features:

  • Epsilon-greedy action selection
  • Q-table management
  • Automatic state initialization

SARSA (SARSA.h/cpp)

An on-policy temporal difference algorithm that learns the value of the policy being followed.

Update Rule:

Q(s,a) ← Q(s,a) + α[r + γ·Q(s',a') - Q(s,a)]

Key Difference from Q-Learning:

  • Uses the actual next action taken (a') instead of the max
  • More conservative (on-policy) learning

3. Environment Module (Env/)

GridWorld (GridWorld.h/cpp)

A simple grid-based environment for RL agents to navigate.

Features:

  • Configurable grid dimensions (default: 5×5)
  • Agent starts at (0, 0)
  • Customizable goal position
  • 4 actions: UP, DOWN, LEFT, RIGHT
  • Reward: +1.0 for reaching goal, -0.01 per step
  • Boundary constraints (agent cannot leave grid)

State Representation:

  • Linear state index: state = y * width + x

Getting Started

Prerequisites

  • C++17 or higher
  • CMake 3.10+
  • Standard C++ compiler (g++, clang, MSVC)

Building the Project

# Clone the repository
git clone https://github.com/manjushwarkhairkar/ML-Library.git
cd ML-Library

# Create build directory
mkdir build
cd build

# Configure and build
cmake ..
make

# Run the program
./rl_main

Example Output

========================================
RL Library v1.0.0
========================================

[1] Initializing GridWorld Environment...
    Grid Size: 5x5
    Total States: 25
    Goal Position: (4, 4)

[2] Initializing Q-Learning Agent...
    Learning Rate (alpha): 0.1
    Discount Factor (gamma): 0.99
    Exploration Rate (epsilon): 0.1

[3] Starting Training...
    Episode  10 | Reward: -0.38
    Episode  20 | Reward: -0.23
    ...
    Episode 100 | Reward: 0.89

[4] Training Complete!
========================================

Usage Examples

1. Using Q-Learning with GridWorld

#include "RL_Lib.h"
using namespace RLLib;

int main() {
    // Initialize environment
    GridWorld env(5, 5, 4, 4);
    
    // Initialize agent
    QLearner agent(0.1, 0.99, 0.1);
    
    // Training loop
    for (int episode = 0; episode < 100; ++episode) {
        env.reset();
        int state = env.getState();
        
        for (int step = 0; step < 50; ++step) {
            int action = agent.chooseAction(state, GridWorld::NUM_ACTIONS);
            double reward = env.step(action);
            int next_state = env.getState();
            
            agent.update(state, action, reward, next_state, GridWorld::NUM_ACTIONS);
            
            state = next_state;
            if (env.isTerminal()) break;
        }
    }
    return 0;
}

2. Using MathUtils

#include "Core/MathUtils.h"
using namespace RLLib;

int main() {
    // Vector operations
    std::vector<double> a = {1.0, 2.0, 3.0};
    std::vector<double> b = {4.0, 5.0, 6.0};
    
    double dot = dot_product(a, b);
    auto sum = vector_add(a, b);
    
    // Activation functions
    double sig = sigmoid(0.5);
    double act = relu(-2.0);
    
    // Matrix operations
    Matrix m1(3, 2);
    Matrix m2(2, 3);
    Matrix result = matrix_multiply(m1, m2);
    
    return 0;
}

API Reference

QLearner Class

// Constructor
QLearner(double alpha = 0.1, double gamma = 0.9, double epsilon = 0.1);

// Initialize state in Q-table
void initializeState(int state, int numActions);

// Select action using epsilon-greedy strategy
int chooseAction(int state, int numActions);

// Update Q-value using Q-Learning rule
void update(int state, int action, double reward, int nextState, int numActions);

// Get Q-value for state-action pair
double getQValue(int state, int action);

// Get maximum Q-value for a state
double getMaxQValue(int state);

GridWorld Class

// Constructor
GridWorld(int width = 5, int height = 5, int goalX = 4, int goalY = 4);

// Get current state index
int getState() const;

// Reset environment
void reset();

// Execute action, return reward
double step(int action);

// Check if at goal
bool isTerminal() const;

// Get positions
std::pair<int, int> getAgentPosition() const;
std::pair<int, int> getGoalPosition() const;

MathUtils Functions

// Activation functions
double sigmoid(double x);
double tanh_activation(double x);
double relu(double x);

// Evaluation metrics
double calculate_mse(const std::vector<double>& target, const std::vector<double>& pred);
double calculate_mae(const std::vector<double>& target, const std::vector<double>& pred);

// Vector operations
double dot_product(const std::vector<double>& a, const std::vector<double>& b);
std::vector<double> vector_add(const std::vector<double>& a, const std::vector<double>& b);
std::vector<double> scalar_multiply(const std::vector<double>& a, double scalar);

// Matrix operations
Matrix matrix_multiply(const Matrix& a, const Matrix& b);
Matrix matrix_transpose(const Matrix& a);
std::vector<double> matrix_vector_multiply(const Matrix& mat, const std::vector<double>& vec);

Performance Considerations

  • Time Complexity: O(1) average for Q-Learning updates
  • Space Complexity: O(n*m) for n states and m actions
  • Scalability: Efficient for tabular methods; consider function approximation for larger state spaces

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

  • Sutton & Barto: "Reinforcement Learning: An Introduction" (2nd Edition)
  • Bellman Equation for MDPs
  • Temporal Difference Learning

Future Enhancements

  • Policy Gradient Methods (REINFORCE, Actor-Critic)
  • Deep Q-Networks (DQN)
  • Experience Replay and Target Networks
  • Multiple environments (CartPole, Mountain Car)
  • Visualization tools
  • GPU acceleration support
  • Python bindings

Version: 1.0.0
Last Updated: 2024
Maintainer: Manjushwar Khairkar

About

C++ Deep Reinforcement Learning library implementing Bellman equations from scratch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors