This project implements a Deep Q-Network (DQN) reinforcement learning agent that learns to play the classic Snake game. The implementation features:
- Deep Q-Learning with experience replay and target networks
- Real-time visualization of the game, neural network weights, and training statistics
- CUDA acceleration for fast training on NVIDIA GPUs
- Interactive parameter tuning during training
- Modern C++17 implementation with PyTorch C++ API
- CPU: 64-bit x86 processor (Intel/AMD)
- GPU: NVIDIA CUDA-compatible GPU (optional but recommended for faster training)
- RAM: Minimum 4GB, 8GB+ recommended
- Storage: 2GB free disk space
- Operating System: Ubuntu 20.04+ / Debian 10+ / Other Linux distributions
- Compiler: GCC 9+ or Clang 10+ with C++17 support
- CMake: Version 3.22 or higher
- CUDA Toolkit: Version 11.0+ (if using GPU acceleration)
- Git: For cloning repositories
sudo apt update && sudo apt upgrade -ysudo apt install -y build-essential cmake git pkg-configsudo apt install -y libgl1-mesa-dev libglu1-mesa-dev libx11-dev libxrandr-dev libxinerama-dev libxcursor-dev libxi-devsudo apt install -y libvulkan-dev vulkan-tools libglvnd-devVisit NVIDIA CUDA Toolkit Archive and download the appropriate version for your system.
# Download CUDA 11.8 (adjust URL for your system)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
# Make installer executable
chmod +x cuda_11.8.0_520.61.05_linux.run
# Run installer (accept terms and select only CUDA Toolkit)
sudo ./cuda_11.8.0_520.61.05_linux.run# Add CUDA to PATH
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
# Reload environment
source ~/.bashrc
# Verify CUDA installation
nvcc --versioncd ~
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg./bootstrap-vcpkg.shecho 'export VCPKG_ROOT=~/vcpkg' >> ~/.bashrc
echo 'export PATH=$VCPKG_ROOT:$PATH' >> ~/.bashrc
source ~/.bashrccd ~/vcpkg
./vcpkg install sdl3:x64-linux./vcpkg install glm:x64-linux./vcpkg list
# You should see:
# sdl3:x64-linux
# glm:x64-linuxcd /home/moinshaikh/CLionProjects/ReinforcementSnake
# Download LibTorch (CUDA 11.8 version - adjust if using different CUDA version)
wget https://download.pytorch.org/libtorch/cu118/libtorch-shared-with-deps-latest.zip
# Extract LibTorch
unzip libtorch-shared-with-deps-latest.zip
rm libtorch-shared-with-deps-latest.zip# Download CPU-only version
wget https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-latest.zip
# Extract
unzip libtorch-shared-with-deps-latest.zip
rm libtorch-shared-with-deps-latest.zip# Create vcpkg toolchain file reference
echo 'set(CMAKE_TOOLCHAIN_FILE "$ENV{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" CACHE STRING "")' >> vcpkg-toolchain.cmakecd /home/moinshaikh/CLionProjects/ReinforcementSnake
mkdir build
cd buildcmake .. -DCMAKE_TOOLCHAIN_FILE=~/vcpkg/scripts/buildsystems/vcpkg.cmake -DCMAKE_BUILD_TYPE=Release# Use all available CPU cores for faster compilation
make -j$(nproc)# Solution: Reinstall SDL3 with vcpkg
cd ~/vcpkg
./vcpkg remove sdl3:x64-linux
./vcpkg install sdl3:x64-linux# Solution: Check CUDA installation and paths
which nvcc
ls /usr/local/cuda/lib64/
echo $LD_LIBRARY_PATH# Solution: Verify LibTorch directory structure
ls -la libtorch/
ls -la libtorch/lib/
ls -la libtorch/include/# Solution: Clear CMake cache and reconfigure
cd build
rm -rf *
cmake .. -DCMAKE_TOOLCHAIN_FILE=~/vcpkg/scripts/buildsystems/vcpkg.cmake -DCMAKE_BUILD_TYPE=Release# Solution: Ensure GCC version supports C++17
g++ --version
# If version < 9, update GCC:
sudo apt install gcc-9 g++-9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 90
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90cd /home/moinshaikh/CLionProjects/ReinforcementSnake/build
./ReinforcementSnakeDuring training, you can use these keyboard controls:
- ↑/↓: Select parameter to adjust
- ←/→: Adjust selected parameter value
- R: Reset all parameters to defaults
- Space: Reset exploration rate (epsilon=1)
- Ctrl+C: Force immediate rendering (doesn't stop training)
- Game Speed: Use +/- keys to adjust rendering FPS (5-120)
- Training Speed: Adjust train_speed parameter (1=slow, 100=fast)
ReinforcementSnake/
├── CMakeLists.txt # CMake configuration
├── main.cpp # Application entry point
├── libtorch/ # PyTorch C++ library
├── src/
│ ├── SnakeAI.hpp # AI implementation header
│ ├── SnakeAI.cpp # AI implementation
│ └── Utils.h # Constants and utilities
├── build/ # Build output directory
└── README.md # This file
| Library | Version | Purpose | Installation Method |
|---|---|---|---|
| SDL3 | Latest | Graphics rendering & window management | vcpkg |
| GLM | Latest | OpenGL mathematics | vcpkg |
| PyTorch | Latest | Deep learning framework | Manual download |
| CUDA Toolkit | 11.0+ | GPU acceleration (optional) | NVIDIA installer |
| CMake | 3.22+ | Build system | apt |
| GCC/Clang | 9+/10+ | C++17 compiler | apt |
# Check compiler
g++ --version
# Check CMake
cmake --version
# Check CUDA (if installed)
nvcc --version
nvidia-smi
# Check vcpkg packages
~/vcpkg/vcpkg list
# Check LibTorch
ls -la /home/moinshaikh/CLionProjects/ReinforcementSnake/libtorch/
# Check built executable
ls -la /home/moinshaikh/CLionProjects/ReinforcementSnake/build/ReinforcementSnakecd /home/moinshaikh/CLionProjects/ReinforcementSnake/build
./ReinforcementSnake --help # Should start the training interfaceAfter successful installation:
- Run Training: Start with a few hundred episodes to test
- Monitor Progress: Watch the score and epsilon graphs
- Adjust Parameters: Use keyboard controls to tune hyperparameters
- Save Models: Extend the code to save trained models
- Experiment: Try different network architectures or reward functions
For issues related to:
- vcpkg: vcpkg GitHub Issues
- PyTorch: PyTorch Forums
- SDL3: SDL Discord/Forums
- CUDA: NVIDIA Developer Forums
This installation guide covers all necessary dependencies and steps to get the Reinforcement Learning Snake project running on Linux systems.
The Reinforcement Learning Snake project implements a sophisticated Deep Q-Network (DQN) agent that learns to play Snake through reinforcement learning. This document details the architecture, algorithms, and implementation choices.
Input Layer: 16 neurons (state representation)
↓
Hidden Layer 1: 128 neurons (ReLU activation)
↓
Hidden Layer 2: 128 neurons (ReLU activation)
↓
Output Layer: 4 neurons (Q-values for actions)
The agent observes the game state through a carefully designed 16-dimensional feature vector:
-
Danger Indicators (4 dimensions): Binary flags for immediate threats
state[0]: Danger straight aheadstate[1]: Danger to the rightstate[2]: Danger to the leftstate[3]: Danger behind
-
Food Direction (4 dimensions): One-hot encoding of food direction
state[4]: Food is upstate[5]: Food is downstate[6]: Food is leftstate[7]: Food is right
-
Distance to Food (2 dimensions): Normalized coordinates
state[8]: Normalized x-distance to foodstate[9]: Normalized y-distance to food
-
Current Direction (4 dimensions): One-hot encoding of snake's movement
state[10]: Moving upstate[11]: Moving downstate[12]: Moving leftstate[13]: Moving right
-
Game Context (2 dimensions):
state[14]: Snake length normalized by grid areastate[15]: Steps without food normalized by 100
The DQN algorithm approximates the optimal action-value function Q*(s,a) using the Bellman equation:
Q*(s,a) = E[R_t + γ * max_a' Q*(s_{t+1}, a') | s_t = s, a_t = a]
Where:
R_tis the immediate rewardγ ∈ [0,1]is the discount factors_t, a_tare current state and actions_{t+1}, a'are next state and optimal next action
The network minimizes the temporal difference error:
L(θ) = E[(R_t + γ * max_a' Q(s_{t+1}, a'; θ^-) - Q(s_t, a_t; θ))^2]
Where:
θare current network parametersθ^-are target network parameters (updated periodically)
for each episode:
reset environment
get initial state
while not terminal:
select action via ε-greedy policy
execute action, observe reward and next_state
store experience (s,a,r,s',done) in replay buffer
if replay buffer has enough experiences:
sample random minibatch
perform gradient descent step
if step % target_update_frequency == 0:
update target network parameters
decay exploration rate ε- Buffer Size: 50,000 experiences
- Sampling: Random minibatch of 128 experiences
- Purpose: Break temporal correlations, improve sample efficiency
- Update Frequency: Every 50 training steps
- Purpose: Provide stable targets for TD-learning
- Mechanism: Copy weights from main network to target network
The reward function shapes the agent's behavior:
float reward = 0.0f;
if (food_eaten) {
reward += 10.0f; // Primary reward
} else if (moved_closer_to_food) {
reward += 0.1f; // Shaping reward
} else if (moved_away_from_food) {
reward -= 0.15f; // Small penalty
}
if (game_over) {
reward -= 10.0f; // Strong penalty for death
}action = {
random_action, with probability ε
argmax_a Q(s,a), with probability 1-ε
}- Start: ε = 1.0 (100% exploration)
- Decay: ε ← ε * 0.998 per episode
- Minimum: ε = 0.01 (1% exploration)
- Grid Size: 12×12 cells
- Cell Size: 40×40 pixels
- Total Game Area: 500×500 pixels
std::deque<Point> snake; // Front = head, Back = tail
Dir dir = Dir::RIGHT; // Current movement direction- Wall Collision: Snake head outside grid bounds
- Self Collision: Head intersects with body segments
- Timeout: Too many steps without eating food
- Game Board (500×500px): Main game visualization
- Statistics Panel (700×500px): Training graphs and parameters
- Network Weights (400×400px): Static network visualization
- Network Activity (400×400px): Real-time forward pass visualization
- 5×7 pixel characters for all ASCII values
- No external font dependencies
- Efficient SDL rendering
- Neural Network Weights: Color-coded connections (red=positive, green=negative)
- Training Graphs: Score history, average scores, epsilon decay
- Live Network Activity: Neuron activations during forward pass
- Parameter Display: Current hyperparameter values with adjustment hints
- Learning Rate (0.00001 - 0.1): Adam optimizer step size
- Gamma (0.5 - 0.999): Discount factor for future rewards
- Epsilon Decay (0.9 - 0.9999): Exploration rate decay
- Batch Size (16 - 512): Mini-batch size
- Replay Buffer Size (1000 - 100000): Experience storage
- Reward Food (1.0 - 100.0): Food eating reward
- Reward Closer (0.0 - 2.0): Moving closer reward
- Penalty Away (-2.0 - 0.0): Moving away penalty
- Penalty Death (-100.0 - -1.0): Death penalty
- Train Speed (1 - 100): Training acceleration factor
- ↑/↓ Arrows: Select parameter
- ←/→ Arrows: Adjust selected parameter
- R Key: Reset all to defaults
- Space: Reset exploration (ε=1)
- +/- Keys: Adjust rendering FPS
- GPU Libraries: libtorch_cuda.so, libc10_cuda.so
- CUDA Runtime: libcudart.so
- NVRTC Compiler: libnvrtc.so
- Automatic GPU Selection: Falls back to CPU if CUDA unavailable
- Experience Replay: Circular buffer with automatic overflow handling
- Tensor Operations: PyTorch automatic memory management
- SDL Resources: Proper cleanup in destructor
- Render Skipping: Adjust train_speed to skip expensive rendering
- FPS Control: Adjustable game_speed for visualization
- Batch Processing: Efficient mini-batch training
src/
├── SnakeAI.hpp # Main AI class declaration (621 lines)
├── SnakeAI.cpp # AI implementation
└── Utils.h # Constants, structures, font data (577 lines)
- PyTorch C++: Deep learning framework
- SDL3: Graphics and window management
- GLM: Mathematics library
- CUDA: GPU acceleration (optional)
- CMake 3.22+: Build configuration
- vcpkg: Package management
- GCC 9+/Clang 10+: C++17 compilation
- SIGINT Handler: Non-destructive interruption for forced rendering
- Graceful Shutdown: Proper resource cleanup
- Float32: Single precision for neural networks
- Normalized Values: All state features normalized to [0,1] or [-1,1]
- Stable Training: Target networks prevent divergence
- Modular Design: Easy to modify network architecture
- Parameter System: Runtime adjustment without recompilation
- Visualization Framework: Adaptable to different games
This architecture document provides a comprehensive overview of the Reinforcement Learning Snake implementation, covering the mathematical foundations, algorithmic details, and engineering choices.