RL Drone - Reinforcement Learning for Autonomous Navigation

A browser-based demonstration of reinforcement learning algorithms training a virtual drone to navigate indoor environments with obstacle avoidance. The entire training process happens live in your browser using TensorFlow.js.

demo: https://rldrone.vercel.app/

running.mp4

🎯 Overview

This project showcases major reinforcement learning algorithms applied to autonomous drone navigation. A virtual drone equipped with 6 directional sensors learns to:

Navigate to goal positions in complex 3D environments
Avoid obstacles using proximity sensors
Optimize flight paths through reinforcement learning
Adapt behavior based on reward feedback

crash.mp4

The entire training pipeline runs in real-time in your browser, making RL concepts accessible and visualizable without requiring specialized hardware or cloud computing.

🧠 Actor Network Implementation (TensorFlow.js)

Here's how the neural network policy is implemented using TensorFlow.js:

export class RLPolicyTF {
  model: tf.Sequential;

  constructor(num_states: number, num_actions: number, network_size: number) {
    this.model = tf.sequential();

    // Input layer: 9D state (3D goal direction + 6 sensor readings)
    this.model.add(tf.layers.dense({
      units: network_size,           // e.g., 256 neurons
      inputShape: [num_states],      // 9 inputs
      activation: "relu"
    }));

    // Hidden layer
    this.model.add(tf.layers.dense({ 
      units: network_size / 2,       // e.g., 128 neurons
      activation: "relu" 
    }));

    // Output layer: Action probabilities
    this.model.add(tf.layers.dense({ 
      units: num_actions,            // 7 discrete actions
      activation: "softmax"          // Probability distribution
    }));
  }

  // Forward pass for action selection
  forwardForInference(state: number[]): number[] {
    const stateTensor = tf.tensor2d([state], [1, state.length]);
    const actionProbs = this.model.predict(stateTensor) as tf.Tensor;
    const result = actionProbs.dataSync() as number[];
    
    // Clean up tensors to prevent memory leaks
    stateTensor.dispose();
    actionProbs.dispose();
    
    return result; // [0.1, 0.05, 0.3, 0.2, 0.15, 0.1, 0.1] - action probabilities
  }
}

fast.mp4

🤖 Reinforcement Learning Algorithms

Implemented Algorithms

REINFORCE - Basic policy gradient method
- Direct policy optimization using Monte Carlo returns
- Simple but effective for discrete action spaces
A2C (Advantage Actor-Critic) - Default algorithm
- Combines policy gradients with value function estimation
- Reduces variance using advantage estimation
- Separate actor (policy) and critic (value) networks
PPO (Proximal Policy Optimization)
- State-of-the-art policy gradient method
- Prevents destructive policy updates with clipped objectives
- More stable training than vanilla policy gradients

reward.mp4

Neural Network Architecture

Policy Network (Actor): Multi-layer neural network with softmax output for action probability distribution
Value Network (Critic): Estimates state values for advantage calculation
Input Features: 9-dimensional state space including:
- 3D directional vector to goal
- 6 proximity sensor readings (left, right, front, back, above, below)

🚁 Drone Environment

Sensor System

The drone is equipped with 6 directional proximity sensors that provide distance measurements to nearby obstacles:

Directional Coverage: 360° horizontal + vertical coverage
Sensor Range: Configurable maximum detection distance
Real-time Feedback: Continuous sensor updates during flight

Reward Structure

Goal Achievement: Positive reward for reaching target positions
Obstacle Avoidance: Penalties for proximity to obstacles and collisions
Direction Incentives: Rewards for moving toward the goal
Distance Penalties: Small penalties to encourage efficient paths

3D Environment

Dynamic Obstacles: Randomly generated obstacle layouts
Bounded Space: Contained 3D flight area with walls
Real-time Visualization: Live 3D rendering of drone, sensors, and environment

🎮 Features

Interactive Training Interface

Real-time 3D Visualization: Watch the drone learn to navigate in real-time
Live Metrics Dashboard: Track training progress with real-time charts
- Total reward per episode
- Policy, value, and entropy losses
- Training convergence metrics
Configurable Parameters: Adjust hyperparameters on-the-fly
- Learning rates
- Network architectures
- Training batch sizes
- Algorithm selection

Browser-Based Training

No Installation Required: Everything runs in your web browser
GPU Acceleration: Leverages WebGL for fast neural network training
Model Persistence: Save and load trained models locally
Real-time Performance: Interactive framerates during training

Advanced Controls

Algorithm Switching: Compare different RL algorithms
Hyperparameter Tuning: Extensive configuration options
Training Visualization: Sensor readings, reward signals, and loss curves
Model Export: Download trained weights for analysis

🚀 Getting Started

Prerequisites

Node.js 18+ and npm/yarn
Modern web browser with WebGL support
4GB+ RAM recommended for training

Installation

# Clone the repository
git clone <repository-url>
cd rldrone

# Install dependencies
npm install
# or
yarn install

# Start the development server
npm run dev
# or
yarn dev

Open http://localhost:3000 to see the application.

Quick Start Training

Load the Application: Navigate to the drone training page
Configure Settings: Adjust training parameters in the settings panel
Start Training: Click "Train From Scratch" to begin
Watch and Learn: Observe the drone learning to navigate in real-time
Analyze Results: Monitor training metrics and performance charts

🛠 Technology Stack

Core Technologies

Next.js 15.4.4 - React framework for the web application
TensorFlow.js 3.7.0 - In-browser machine learning and neural networks
Three.js + React Three Fiber - 3D visualization and rendering
TypeScript - Type-safe development

3D Graphics & Visualization

@react-three/fiber - React renderer for Three.js
@react-three/drei - Useful helpers for 3D development
WebGL - Hardware-accelerated 3D graphics

Machine Learning Stack

@tensorflow/tfjs-backend-webgl - GPU acceleration via WebGL
@tensorflow/tfjs-backend-cpu - CPU fallback for training
@tensorflow/tfjs-backend-wasm - WebAssembly backend for performance

📊 Training Insights

State Space (9 dimensions)

Goal Direction Vector (3D): Normalized direction from drone to goal
Sensor Readings (6D): Distance measurements from each directional sensor

Action Space (7 discrete actions)

Move Forward/Backward
Move Left/Right
Move Up/Down
Stay in place

Hyperparameter Options

Learning Rate: 1e-5 to 1e-3
Network Sizes: 64 to 512 neurons
Batch Sizes: 512 to 4096 samples
Discount Factor: 0.9 to 0.99
Episode Length: 1000 to 10000 steps

🎯 Educational Value

This project demonstrates:

RL Algorithm Comparison: Side-by-side performance of different approaches
Hyperparameter Sensitivity: How settings affect learning
Exploration vs Exploitation: Balance between trying new actions and exploiting known good ones
Neural Network Training: Real-time visualization of gradient descent
Sensor Fusion: Combining multiple sensor inputs for decision making

🗺️ Code Map & Architecture

Project Structure

app/
├── page.tsx                    # Landing page with project overview
├── layout.tsx                  # Root layout and global styles
├── page.utils.tsx             # Shared utilities (mobile detection, etc.)
├── DronePageClient.tsx        # Main drone training page client component
├── globals.css               # Global CSS styles
│
├── drone/                    # Core drone RL implementation
│   ├── Drone.model.ts        # TypeScript interfaces and default settings
│   │
│   ├── RL/                   # Reinforcement Learning algorithms
│   │   ├── DroneEnv.ts       # Environment simulation (state, actions, rewards)
│   │   ├── DroneTrainer.ts   # Main training loop and episode management
│   │   ├── RLPolicyTF.ts     # Policy network (actor) implementation
│   │   ├── ValuePolicyTF.ts  # Value network (critic) implementation
│   │   └── useDroneTrainer.ts # React hook for trainer lifecycle
│   │
│   ├── Components/           # React UI components
│   │   ├── DronePage.tsx     # Main 3D training interface
│   │   ├── DroneTrainerControlPanel.tsx # Training controls and settings
│   │   ├── DroneSettings.tsx # Hyperparameter configuration
│   │   ├── IntroModal.tsx    # Welcome tutorial modal
│   │   ├── SimpleChart.tsx   # Real-time loss/reward charts
│   │   ├── SimpleBarChart.tsx # Bar chart component
│   │   ├── TooltipOverlay.tsx # Interactive help tooltips
│   │   └── UpdatingWeightsOverlay.tsx # Training status indicator
│   │
│   ├── Display3D/           # 3D visualization components
│   │   ├── DroneDisplay.tsx  # 3D drone and sensor rendering
│   │   ├── EnvironmentDisplay.tsx # 3D obstacles and environment
│   │   └── EdgesOnlyBox.tsx  # Wireframe box component
│   │
│   ├── hooks/               # Custom React hooks
│   │   ├── useDroneDisplay.tsx # 3D scene management
│   │   └── useGraphs.ts     # Chart data and sensor visualization
│   │
│   ├── utils/               # Utility functions
│   │   ├── FiberUtils.tsx   # Three.js/React-Three-Fiber helpers
│   │   ├── rl.utils.ts      # RL-specific utility functions
│   │   └── useGizmos.tsx    # 3D debugging and visualization helpers
│   │
│   └── tooltipTips.ts       # Help text and tutorial content
│
└── ablation/                # Ablation study for hyperparameter testing
    ├── page.tsx             # Ablation study page
    └── AblationPageClient.tsx # Headless training for parameter optimization

Core Architecture Components

🧠 Reinforcement Learning Core (`drone/RL/`)

DroneEnv.ts: Implements the Markov Decision Process
- State space: 9D (3D goal direction + 6 sensor readings)
- Action space: 7 discrete actions (6 directions + stay)
- Reward function: Goal achievement, obstacle avoidance, efficiency
DroneTrainer.ts: Training orchestration
- Episode management and environment resets
- Experience collection and batch processing
- Algorithm switching (REINFORCE, A2C, PPO)
- Real-time metrics tracking
RLPolicyTF.ts & ValuePolicyTF.ts: Neural networks
- TensorFlow.js implementation for browser training
- Policy network: State → Action probabilities
- Value network: State → Expected return estimation
- GPU-accelerated via WebGL backend

🎮 Interactive Interface (`drone/Components/`)

DronePage.tsx: Main 3D training environment
- Three.js scene setup with camera controls
- Real-time drone and sensor visualization
- Integration of training loop with 3D rendering
DroneTrainerControlPanel.tsx: Training controls
- Start/stop training controls
- Real-time metric displays
- Algorithm and hyperparameter selection
IntroModal.tsx: Interactive tutorial
- 4-slide introduction with videos
- Explains RL concepts and interface usage

🎨 3D Visualization (`drone/Display3D/`)

DroneDisplay.tsx: Drone and sensor rendering
- 3D drone model with directional sensors
- Real-time sensor value visualization (color-coded)
- Dynamic sensor line rendering to show obstacle detection
EnvironmentDisplay.tsx: World rendering
- Procedural obstacle generation
- Goal position visualization
- Environment boundaries and collision detection

🔬 Advanced Analysis (`ablation/`)

AblationPageClient.tsx: Automated hyperparameter testing
- Headless training for systematic parameter evaluation
- Statistical analysis of training performance
- Export functionality for research data

Data Flow Architecture

User Input → DroneTrainerControlPanel → DroneTrainer → DroneEnv
    ↓                                        ↓           ↓
Settings/Config                         RL Algorithms   State/Reward
    ↓                                        ↓           ↓
Neural Networks ← Experience Buffer ← Action Selection ← Sensors
    ↓                     ↓                             ↓
Model Updates        Batch Training                 3D Visualization
    ↓                     ↓                             ↓
Performance Charts ← Metrics Collection ← Real-time Rendering

Key Integration Points

TensorFlow.js Integration: All neural network operations use TensorFlow.js for browser-native training
Three.js Integration: React-Three-Fiber provides declarative 3D scene management
Real-time Updates: Training loop synchronizes with 3D rendering loop for live visualization
State Management: React hooks manage training state, UI state, and 3D scene state
Performance Optimization: WebGL backend for GPU acceleration, requestAnimationFrame for smooth rendering

🤝 Contributing

This project is perfect for:

RL Researchers: Experimenting with new algorithms
Students: Learning RL concepts through visualization
Developers: Adding new features or environments
Educators: Teaching autonomous systems concepts

Development Guidelines

Adding New Algorithms: Extend DroneTrainer.ts and implement in RL/ directory
UI Components: Follow React/TypeScript patterns in Components/ directory
3D Features: Use React-Three-Fiber patterns in Display3D/ directory
Performance: Leverage WebGL for computationally intensive operations

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
public		public
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
next.config.ts		next.config.ts
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Folders and files

Latest commit

History

Repository files navigation

RL Drone - Reinforcement Learning for Autonomous Navigation

🎯 Overview

🧠 Actor Network Implementation (TensorFlow.js)

🤖 Reinforcement Learning Algorithms

Implemented Algorithms

Neural Network Architecture

🚁 Drone Environment

Sensor System

Reward Structure

3D Environment

🎮 Features

Interactive Training Interface

Browser-Based Training

Advanced Controls

🚀 Getting Started

Prerequisites

Installation

Quick Start Training

🛠 Technology Stack

Core Technologies

3D Graphics & Visualization

Machine Learning Stack

📊 Training Insights

State Space (9 dimensions)

Action Space (7 discrete actions)

Hyperparameter Options

🎯 Educational Value

🗺️ Code Map & Architecture

Project Structure

Core Architecture Components

🧠 Reinforcement Learning Core (drone/RL/)

🎮 Interactive Interface (drone/Components/)

🎨 3D Visualization (drone/Display3D/)

🔬 Advanced Analysis (ablation/)

Data Flow Architecture

Key Integration Points

🤝 Contributing

Development Guidelines

📝 License

🔗 Related Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🧠 Reinforcement Learning Core (`drone/RL/`)

🎮 Interactive Interface (`drone/Components/`)

🎨 3D Visualization (`drone/Display3D/`)

🔬 Advanced Analysis (`ablation/`)

Packages