Authors: Daniel Namaki, Niccolò Settimelli
Course: Symbolic and Evolutionary Artificial Intelligence
Academic Year: 2024/2025 – University of Pisa
This project investigates non-standard reinforcement learning (RL) methods that leverage lexicographic reward prioritization on the classic LunarLander-v2 environment. Instead of a single scalar reward, our agents optimize a vector reward with strict priorities:
- ✅ Survival (avoid crashing)
- 🎯 Landing quality (upright, centered touchdown)
- ⛽ Fuel efficiency
We implement and compare:
- Potential-Based Survival Shaping
- Cone-Aware Survival Shaping
- Curriculum Learning with Prioritized Replay
- Standard DQN Baselines
2025_SEAI_F01/
├── models/ # Saved model checkpoints
├── networks/ # LexQNetwork & standard Q-network code
├── v_cone/ # Cone-aware shaping agent
├── v_potential_shaping/ # Potential-based shaping agent
├── v_prioritized_curriculum_learning/ # Curriculum + prioritized replay agent
├── v_standard/ # Standard & prioritized DQN agents
├── requirements.txt # Python dependencies
├── doc_seai_f01.pdf # Full project report
└── README.md # This overview