A browser-based demonstration of reinforcement learning algorithms training a virtual drone to navigate indoor environments with obstacle avoidance. The entire training process happens live in your browser using TensorFlow.js.
demo: https://rldrone.vercel.app/
running.mp4
This project showcases major reinforcement learning algorithms applied to autonomous drone navigation. A virtual drone equipped with 6 directional sensors learns to:
- Navigate to goal positions in complex 3D environments
- Avoid obstacles using proximity sensors
- Optimize flight paths through reinforcement learning
- Adapt behavior based on reward feedback
crash.mp4
The entire training pipeline runs in real-time in your browser, making RL concepts accessible and visualizable without requiring specialized hardware or cloud computing.
Here's how the neural network policy is implemented using TensorFlow.js:
export class RLPolicyTF {
model: tf.Sequential;
constructor(num_states: number, num_actions: number, network_size: number) {
this.model = tf.sequential();
// Input layer: 9D state (3D goal direction + 6 sensor readings)
this.model.add(tf.layers.dense({
units: network_size, // e.g., 256 neurons
inputShape: [num_states], // 9 inputs
activation: "relu"
}));
// Hidden layer
this.model.add(tf.layers.dense({
units: network_size / 2, // e.g., 128 neurons
activation: "relu"
}));
// Output layer: Action probabilities
this.model.add(tf.layers.dense({
units: num_actions, // 7 discrete actions
activation: "softmax" // Probability distribution
}));
}
// Forward pass for action selection
forwardForInference(state: number[]): number[] {
const stateTensor = tf.tensor2d([state], [1, state.length]);
const actionProbs = this.model.predict(stateTensor) as tf.Tensor;
const result = actionProbs.dataSync() as number[];
// Clean up tensors to prevent memory leaks
stateTensor.dispose();
actionProbs.dispose();
return result; // [0.1, 0.05, 0.3, 0.2, 0.15, 0.1, 0.1] - action probabilities
}
}fast.mp4
-
REINFORCE - Basic policy gradient method
- Direct policy optimization using Monte Carlo returns
- Simple but effective for discrete action spaces
-
A2C (Advantage Actor-Critic) - Default algorithm
- Combines policy gradients with value function estimation
- Reduces variance using advantage estimation
- Separate actor (policy) and critic (value) networks
-
PPO (Proximal Policy Optimization)
- State-of-the-art policy gradient method
- Prevents destructive policy updates with clipped objectives
- More stable training than vanilla policy gradients
reward.mp4
- Policy Network (Actor): Multi-layer neural network with softmax output for action probability distribution
- Value Network (Critic): Estimates state values for advantage calculation
- Input Features: 9-dimensional state space including:
- 3D directional vector to goal
- 6 proximity sensor readings (left, right, front, back, above, below)
The drone is equipped with 6 directional proximity sensors that provide distance measurements to nearby obstacles:
- Directional Coverage: 360ยฐ horizontal + vertical coverage
- Sensor Range: Configurable maximum detection distance
- Real-time Feedback: Continuous sensor updates during flight
- Goal Achievement: Positive reward for reaching target positions
- Obstacle Avoidance: Penalties for proximity to obstacles and collisions
- Direction Incentives: Rewards for moving toward the goal
- Distance Penalties: Small penalties to encourage efficient paths
- Dynamic Obstacles: Randomly generated obstacle layouts
- Bounded Space: Contained 3D flight area with walls
- Real-time Visualization: Live 3D rendering of drone, sensors, and environment
- Real-time 3D Visualization: Watch the drone learn to navigate in real-time
- Live Metrics Dashboard: Track training progress with real-time charts
- Total reward per episode
- Policy, value, and entropy losses
- Training convergence metrics
- Configurable Parameters: Adjust hyperparameters on-the-fly
- Learning rates
- Network architectures
- Training batch sizes
- Algorithm selection
- No Installation Required: Everything runs in your web browser
- GPU Acceleration: Leverages WebGL for fast neural network training
- Model Persistence: Save and load trained models locally
- Real-time Performance: Interactive framerates during training
- Algorithm Switching: Compare different RL algorithms
- Hyperparameter Tuning: Extensive configuration options
- Training Visualization: Sensor readings, reward signals, and loss curves
- Model Export: Download trained weights for analysis
- Node.js 18+ and npm/yarn
- Modern web browser with WebGL support
- 4GB+ RAM recommended for training
# Clone the repository
git clone <repository-url>
cd rldrone
# Install dependencies
npm install
# or
yarn install
# Start the development server
npm run dev
# or
yarn devOpen http://localhost:3000 to see the application.
- Load the Application: Navigate to the drone training page
- Configure Settings: Adjust training parameters in the settings panel
- Start Training: Click "Train From Scratch" to begin
- Watch and Learn: Observe the drone learning to navigate in real-time
- Analyze Results: Monitor training metrics and performance charts
- Next.js 15.4.4 - React framework for the web application
- TensorFlow.js 3.7.0 - In-browser machine learning and neural networks
- Three.js + React Three Fiber - 3D visualization and rendering
- TypeScript - Type-safe development
- @react-three/fiber - React renderer for Three.js
- @react-three/drei - Useful helpers for 3D development
- WebGL - Hardware-accelerated 3D graphics
- @tensorflow/tfjs-backend-webgl - GPU acceleration via WebGL
- @tensorflow/tfjs-backend-cpu - CPU fallback for training
- @tensorflow/tfjs-backend-wasm - WebAssembly backend for performance
- Goal Direction Vector (3D): Normalized direction from drone to goal
- Sensor Readings (6D): Distance measurements from each directional sensor
- Move Forward/Backward
- Move Left/Right
- Move Up/Down
- Stay in place
- Learning Rate: 1e-5 to 1e-3
- Network Sizes: 64 to 512 neurons
- Batch Sizes: 512 to 4096 samples
- Discount Factor: 0.9 to 0.99
- Episode Length: 1000 to 10000 steps
This project demonstrates:
- RL Algorithm Comparison: Side-by-side performance of different approaches
- Hyperparameter Sensitivity: How settings affect learning
- Exploration vs Exploitation: Balance between trying new actions and exploiting known good ones
- Neural Network Training: Real-time visualization of gradient descent
- Sensor Fusion: Combining multiple sensor inputs for decision making
app/
โโโ page.tsx # Landing page with project overview
โโโ layout.tsx # Root layout and global styles
โโโ page.utils.tsx # Shared utilities (mobile detection, etc.)
โโโ DronePageClient.tsx # Main drone training page client component
โโโ globals.css # Global CSS styles
โ
โโโ drone/ # Core drone RL implementation
โ โโโ Drone.model.ts # TypeScript interfaces and default settings
โ โ
โ โโโ RL/ # Reinforcement Learning algorithms
โ โ โโโ DroneEnv.ts # Environment simulation (state, actions, rewards)
โ โ โโโ DroneTrainer.ts # Main training loop and episode management
โ โ โโโ RLPolicyTF.ts # Policy network (actor) implementation
โ โ โโโ ValuePolicyTF.ts # Value network (critic) implementation
โ โ โโโ useDroneTrainer.ts # React hook for trainer lifecycle
โ โ
โ โโโ Components/ # React UI components
โ โ โโโ DronePage.tsx # Main 3D training interface
โ โ โโโ DroneTrainerControlPanel.tsx # Training controls and settings
โ โ โโโ DroneSettings.tsx # Hyperparameter configuration
โ โ โโโ IntroModal.tsx # Welcome tutorial modal
โ โ โโโ SimpleChart.tsx # Real-time loss/reward charts
โ โ โโโ SimpleBarChart.tsx # Bar chart component
โ โ โโโ TooltipOverlay.tsx # Interactive help tooltips
โ โ โโโ UpdatingWeightsOverlay.tsx # Training status indicator
โ โ
โ โโโ Display3D/ # 3D visualization components
โ โ โโโ DroneDisplay.tsx # 3D drone and sensor rendering
โ โ โโโ EnvironmentDisplay.tsx # 3D obstacles and environment
โ โ โโโ EdgesOnlyBox.tsx # Wireframe box component
โ โ
โ โโโ hooks/ # Custom React hooks
โ โ โโโ useDroneDisplay.tsx # 3D scene management
โ โ โโโ useGraphs.ts # Chart data and sensor visualization
โ โ
โ โโโ utils/ # Utility functions
โ โ โโโ FiberUtils.tsx # Three.js/React-Three-Fiber helpers
โ โ โโโ rl.utils.ts # RL-specific utility functions
โ โ โโโ useGizmos.tsx # 3D debugging and visualization helpers
โ โ
โ โโโ tooltipTips.ts # Help text and tutorial content
โ
โโโ ablation/ # Ablation study for hyperparameter testing
โโโ page.tsx # Ablation study page
โโโ AblationPageClient.tsx # Headless training for parameter optimization
-
DroneEnv.ts: Implements the Markov Decision Process- State space: 9D (3D goal direction + 6 sensor readings)
- Action space: 7 discrete actions (6 directions + stay)
- Reward function: Goal achievement, obstacle avoidance, efficiency
-
DroneTrainer.ts: Training orchestration- Episode management and environment resets
- Experience collection and batch processing
- Algorithm switching (REINFORCE, A2C, PPO)
- Real-time metrics tracking
-
RLPolicyTF.ts&ValuePolicyTF.ts: Neural networks- TensorFlow.js implementation for browser training
- Policy network: State โ Action probabilities
- Value network: State โ Expected return estimation
- GPU-accelerated via WebGL backend
-
DronePage.tsx: Main 3D training environment- Three.js scene setup with camera controls
- Real-time drone and sensor visualization
- Integration of training loop with 3D rendering
-
DroneTrainerControlPanel.tsx: Training controls- Start/stop training controls
- Real-time metric displays
- Algorithm and hyperparameter selection
-
IntroModal.tsx: Interactive tutorial- 4-slide introduction with videos
- Explains RL concepts and interface usage
-
DroneDisplay.tsx: Drone and sensor rendering- 3D drone model with directional sensors
- Real-time sensor value visualization (color-coded)
- Dynamic sensor line rendering to show obstacle detection
-
EnvironmentDisplay.tsx: World rendering- Procedural obstacle generation
- Goal position visualization
- Environment boundaries and collision detection
AblationPageClient.tsx: Automated hyperparameter testing- Headless training for systematic parameter evaluation
- Statistical analysis of training performance
- Export functionality for research data
User Input โ DroneTrainerControlPanel โ DroneTrainer โ DroneEnv
โ โ โ
Settings/Config RL Algorithms State/Reward
โ โ โ
Neural Networks โ Experience Buffer โ Action Selection โ Sensors
โ โ โ
Model Updates Batch Training 3D Visualization
โ โ โ
Performance Charts โ Metrics Collection โ Real-time Rendering
- TensorFlow.js Integration: All neural network operations use TensorFlow.js for browser-native training
- Three.js Integration: React-Three-Fiber provides declarative 3D scene management
- Real-time Updates: Training loop synchronizes with 3D rendering loop for live visualization
- State Management: React hooks manage training state, UI state, and 3D scene state
- Performance Optimization: WebGL backend for GPU acceleration, requestAnimationFrame for smooth rendering
This project is perfect for:
- RL Researchers: Experimenting with new algorithms
- Students: Learning RL concepts through visualization
- Developers: Adding new features or environments
- Educators: Teaching autonomous systems concepts
- Adding New Algorithms: Extend
DroneTrainer.tsand implement inRL/directory - UI Components: Follow React/TypeScript patterns in
Components/directory - 3D Features: Use React-Three-Fiber patterns in
Display3D/directory - Performance: Leverage WebGL for computationally intensive operations
This project is licensed under the MIT License - see the LICENSE file for details.
- TensorFlow.js Documentation
- Reinforcement Learning: An Introduction
- Three.js Documentation
- React Three Fiber