Reinforcement Learning Snake - Complete Installation Guide

Overview

This project implements a Deep Q-Network (DQN) reinforcement learning agent that learns to play the classic Snake game. The implementation features:

Deep Q-Learning with experience replay and target networks
Real-time visualization of the game, neural network weights, and training statistics
CUDA acceleration for fast training on NVIDIA GPUs
Interactive parameter tuning during training
Modern C++17 implementation with PyTorch C++ API

System Requirements

Hardware Requirements

CPU: 64-bit x86 processor (Intel/AMD)
GPU: NVIDIA CUDA-compatible GPU (optional but recommended for faster training)
RAM: Minimum 4GB, 8GB+ recommended
Storage: 2GB free disk space

Software Requirements

Operating System: Ubuntu 20.04+ / Debian 10+ / Other Linux distributions
Compiler: GCC 9+ or Clang 10+ with C++17 support
CMake: Version 3.22 or higher
CUDA Toolkit: Version 11.0+ (if using GPU acceleration)
Git: For cloning repositories

Step-by-Step Installation

Step 1: Install System Dependencies

Update Package Manager

sudo apt update && sudo apt upgrade -y

Install Essential Build Tools

sudo apt install -y build-essential cmake git pkg-config

Install Graphics and Window System Libraries

sudo apt install -y libgl1-mesa-dev libglu1-mesa-dev libx11-dev libxrandr-dev libxinerama-dev libxcursor-dev libxi-dev

Install OpenGL and Vulkan Development Headers

sudo apt install -y libvulkan-dev vulkan-tools libglvnd-dev

Step 2: Install CUDA Toolkit (Optional but Recommended)

Download CUDA Toolkit

Visit NVIDIA CUDA Toolkit Archive and download the appropriate version for your system.

Install CUDA (Example for CUDA 11.8)

# Download CUDA 11.8 (adjust URL for your system)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run

# Make installer executable
chmod +x cuda_11.8.0_520.61.05_linux.run

# Run installer (accept terms and select only CUDA Toolkit)
sudo ./cuda_11.8.0_520.61.05_linux.run

Configure CUDA Environment

# Add CUDA to PATH
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

# Reload environment
source ~/.bashrc

# Verify CUDA installation
nvcc --version

Step 3: Install vcpkg Package Manager

Clone vcpkg Repository

cd ~
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg

Bootstrap vcpkg

./bootstrap-vcpkg.sh

Add vcpkg to System PATH

echo 'export VCPKG_ROOT=~/vcpkg' >> ~/.bashrc
echo 'export PATH=$VCPKG_ROOT:$PATH' >> ~/.bashrc
source ~/.bashrc

Step 4: Install Project Dependencies with vcpkg

Install SDL3 (Simple DirectMedia Layer 3)

cd ~/vcpkg
./vcpkg install sdl3:x64-linux

Install GLM (OpenGL Mathematics Library)

./vcpkg install glm:x64-linux

Verify vcpkg Installations

./vcpkg list
# You should see:
# sdl3:x64-linux
# glm:x64-linux

Step 5: Install PyTorch C++ Distribution (LibTorch)

Download LibTorch with CUDA Support

cd /home/moinshaikh/CLionProjects/ReinforcementSnake

# Download LibTorch (CUDA 11.8 version - adjust if using different CUDA version)
wget https://download.pytorch.org/libtorch/cu118/libtorch-shared-with-deps-latest.zip

# Extract LibTorch
unzip libtorch-shared-with-deps-latest.zip
rm libtorch-shared-with-deps-latest.zip

Alternative: CPU-Only Version (if no CUDA GPU)

# Download CPU-only version
wget https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-latest.zip

# Extract
unzip libtorch-shared-with-deps-latest.zip
rm libtorch-shared-with-deps-latest.zip

Step 6: Configure CMake for vcpkg Integration

Create CMake Presets File

# Create vcpkg toolchain file reference
echo 'set(CMAKE_TOOLCHAIN_FILE "$ENV{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" CACHE STRING "")' >> vcpkg-toolchain.cmake

Step 7: Build the Project

Create Build Directory

cd /home/moinshaikh/CLionProjects/ReinforcementSnake
mkdir build
cd build

Configure with CMake

cmake .. -DCMAKE_TOOLCHAIN_FILE=~/vcpkg/scripts/buildsystems/vcpkg.cmake -DCMAKE_BUILD_TYPE=Release

Compile the Project

# Use all available CPU cores for faster compilation
make -j$(nproc)

Troubleshooting Common Issues

Issue 1: SDL3 Not Found

# Solution: Reinstall SDL3 with vcpkg
cd ~/vcpkg
./vcpkg remove sdl3:x64-linux
./vcpkg install sdl3:x64-linux

Issue 2: CUDA Libraries Not Found

# Solution: Check CUDA installation and paths
which nvcc
ls /usr/local/cuda/lib64/
echo $LD_LIBRARY_PATH

Issue 3: LibTorch Linking Errors

# Solution: Verify LibTorch directory structure
ls -la libtorch/
ls -la libtorch/lib/
ls -la libtorch/include/

Issue 4: CMake Configuration Fails

# Solution: Clear CMake cache and reconfigure
cd build
rm -rf *
cmake .. -DCMAKE_TOOLCHAIN_FILE=~/vcpkg/scripts/buildsystems/vcpkg.cmake -DCMAKE_BUILD_TYPE=Release

Issue 5: Compilation Errors with GCC

# Solution: Ensure GCC version supports C++17
g++ --version
# If version < 9, update GCC:
sudo apt install gcc-9 g++-9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 90
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 90

Running the Application

Basic Execution

cd /home/moinshaikh/CLionProjects/ReinforcementSnake/build
./ReinforcementSnake

Training Controls

During training, you can use these keyboard controls:

↑/↓: Select parameter to adjust
←/→: Adjust selected parameter value
R: Reset all parameters to defaults
Space: Reset exploration rate (epsilon=1)
Ctrl+C: Force immediate rendering (doesn't stop training)

Performance Tuning

Game Speed: Use +/- keys to adjust rendering FPS (5-120)
Training Speed: Adjust train_speed parameter (1=slow, 100=fast)

Project Structure

ReinforcementSnake/
├── CMakeLists.txt          # CMake configuration
├── main.cpp                # Application entry point
├── libtorch/               # PyTorch C++ library
├── src/
│   ├── SnakeAI.hpp         # AI implementation header
│   ├── SnakeAI.cpp         # AI implementation
│   └── Utils.h             # Constants and utilities
├── build/                  # Build output directory
└── README.md               # This file

Required Libraries Summary

Library	Version	Purpose	Installation Method
SDL3	Latest	Graphics rendering & window management	vcpkg
GLM	Latest	OpenGL mathematics	vcpkg
PyTorch	Latest	Deep learning framework	Manual download
CUDA Toolkit	11.0+	GPU acceleration (optional)	NVIDIA installer
CMake	3.22+	Build system	apt
GCC/Clang	9+/10+	C++17 compiler	apt

Verification Commands

Verify All Dependencies

# Check compiler
g++ --version

# Check CMake
cmake --version

# Check CUDA (if installed)
nvcc --version
nvidia-smi

# Check vcpkg packages
~/vcpkg/vcpkg list

# Check LibTorch
ls -la /home/moinshaikh/CLionProjects/ReinforcementSnake/libtorch/

# Check built executable
ls -la /home/moinshaikh/CLionProjects/ReinforcementSnake/build/ReinforcementSnake

Test Run

cd /home/moinshaikh/CLionProjects/ReinforcementSnake/build
./ReinforcementSnake --help  # Should start the training interface

Next Steps

After successful installation:

Run Training: Start with a few hundred episodes to test
Monitor Progress: Watch the score and epsilon graphs
Adjust Parameters: Use keyboard controls to tune hyperparameters
Save Models: Extend the code to save trained models
Experiment: Try different network architectures or reward functions

Support

For issues related to:

vcpkg: vcpkg GitHub Issues
PyTorch: PyTorch Forums
SDL3: SDL Discord/Forums
CUDA: NVIDIA Developer Forums

This installation guide covers all necessary dependencies and steps to get the Reinforcement Learning Snake project running on Linux systems.

Project Architecture and Implementation Details

Overview

The Reinforcement Learning Snake project implements a sophisticated Deep Q-Network (DQN) agent that learns to play Snake through reinforcement learning. This document details the architecture, algorithms, and implementation choices.

Core Components

1. Deep Q-Network Architecture

Neural Network Structure

Input Layer: 16 neurons (state representation)
    ↓
Hidden Layer 1: 128 neurons (ReLU activation)
    ↓
Hidden Layer 2: 128 neurons (ReLU activation)
    ↓
Output Layer: 4 neurons (Q-values for actions)

State Representation (16-dimensional vector)

The agent observes the game state through a carefully designed 16-dimensional feature vector:

Danger Indicators (4 dimensions): Binary flags for immediate threats
- state[0]: Danger straight ahead
- state[1]: Danger to the right
- state[2]: Danger to the left
- state[3]: Danger behind
Food Direction (4 dimensions): One-hot encoding of food direction
- state[4]: Food is up
- state[5]: Food is down
- state[6]: Food is left
- state[7]: Food is right
Distance to Food (2 dimensions): Normalized coordinates
- state[8]: Normalized x-distance to food
- state[9]: Normalized y-distance to food
Current Direction (4 dimensions): One-hot encoding of snake's movement
- state[10]: Moving up
- state[11]: Moving down
- state[12]: Moving left
- state[13]: Moving right
Game Context (2 dimensions):
- state[14]: Snake length normalized by grid area
- state[15]: Steps without food normalized by 100

2. Deep Q-Learning Algorithm

Mathematical Foundation

The DQN algorithm approximates the optimal action-value function Q*(s,a) using the Bellman equation:

Q*(s,a) = E[R_t + γ * max_a' Q*(s_{t+1}, a') | s_t = s, a_t = a]

Where:

R_t is the immediate reward
γ ∈ [0,1] is the discount factor
s_t, a_t are current state and action
s_{t+1}, a' are next state and optimal next action

Loss Function

The network minimizes the temporal difference error:

L(θ) = E[(R_t + γ * max_a' Q(s_{t+1}, a'; θ^-) - Q(s_t, a_t; θ))^2]

Where:

θ are current network parameters
θ^- are target network parameters (updated periodically)

3. Training Algorithm

Main Training Loop

for each episode:
    reset environment
    get initial state
    
    while not terminal:
        select action via ε-greedy policy
        execute action, observe reward and next_state
        store experience (s,a,r,s',done) in replay buffer
        
        if replay buffer has enough experiences:
            sample random minibatch
            perform gradient descent step
            
        if step % target_update_frequency == 0:
            update target network parameters
            
    decay exploration rate ε

Experience Replay

Buffer Size: 50,000 experiences
Sampling: Random minibatch of 128 experiences
Purpose: Break temporal correlations, improve sample efficiency

Target Network

Update Frequency: Every 50 training steps
Purpose: Provide stable targets for TD-learning
Mechanism: Copy weights from main network to target network

4. Reward Function Design

The reward function shapes the agent's behavior:

float reward = 0.0f;

if (food_eaten) {
    reward += 10.0f;           // Primary reward
} else if (moved_closer_to_food) {
    reward += 0.1f;            // Shaping reward
} else if (moved_away_from_food) {
    reward -= 0.15f;           // Small penalty
}

if (game_over) {
    reward -= 10.0f;           // Strong penalty for death
}

5. Exploration Strategy

ε-Greedy Policy

action = {
    random_action,     with probability ε
    argmax_a Q(s,a),   with probability 1-ε
}

Epsilon Decay

Start: ε = 1.0 (100% exploration)
Decay: ε ← ε * 0.998 per episode
Minimum: ε = 0.01 (1% exploration)

6. Game Implementation

Grid System

Grid Size: 12×12 cells
Cell Size: 40×40 pixels
Total Game Area: 500×500 pixels

Snake Representation

std::deque<Point> snake;  // Front = head, Back = tail
Dir dir = Dir::RIGHT;     // Current movement direction

Collision Detection

Wall Collision: Snake head outside grid bounds
Self Collision: Head intersects with body segments
Timeout: Too many steps without eating food

7. Rendering System

Four-Panel Layout

Game Board (500×500px): Main game visualization
Statistics Panel (700×500px): Training graphs and parameters
Network Weights (400×400px): Static network visualization
Network Activity (400×400px): Real-time forward pass visualization

Custom Bitmap Font

5×7 pixel characters for all ASCII values
No external font dependencies
Efficient SDL rendering

Real-time Visualization Features

Neural Network Weights: Color-coded connections (red=positive, green=negative)
Training Graphs: Score history, average scores, epsilon decay
Live Network Activity: Neuron activations during forward pass
Parameter Display: Current hyperparameter values with adjustment hints

8. Interactive Parameter Tuning

Adjustable Parameters

Learning Rate (0.00001 - 0.1): Adam optimizer step size
Gamma (0.5 - 0.999): Discount factor for future rewards
Epsilon Decay (0.9 - 0.9999): Exploration rate decay
Batch Size (16 - 512): Mini-batch size
Replay Buffer Size (1000 - 100000): Experience storage
Reward Food (1.0 - 100.0): Food eating reward
Reward Closer (0.0 - 2.0): Moving closer reward
Penalty Away (-2.0 - 0.0): Moving away penalty
Penalty Death (-100.0 - -1.0): Death penalty
Train Speed (1 - 100): Training acceleration factor

Control Interface

↑/↓ Arrows: Select parameter
←/→ Arrows: Adjust selected parameter
R Key: Reset all to defaults
Space: Reset exploration (ε=1)
+/- Keys: Adjust rendering FPS

9. Performance Optimizations

CUDA Acceleration

GPU Libraries: libtorch_cuda.so, libc10_cuda.so
CUDA Runtime: libcudart.so
NVRTC Compiler: libnvrtc.so
Automatic GPU Selection: Falls back to CPU if CUDA unavailable

Memory Management

Experience Replay: Circular buffer with automatic overflow handling
Tensor Operations: PyTorch automatic memory management
SDL Resources: Proper cleanup in destructor

Training Speed Control

Render Skipping: Adjust train_speed to skip expensive rendering
FPS Control: Adjustable game_speed for visualization
Batch Processing: Efficient mini-batch training

10. File Structure and Dependencies

Source Files

src/
├── SnakeAI.hpp     # Main AI class declaration (621 lines)
├── SnakeAI.cpp     # AI implementation
└── Utils.h         # Constants, structures, font data (577 lines)

Key Dependencies

PyTorch C++: Deep learning framework
SDL3: Graphics and window management
GLM: Mathematics library
CUDA: GPU acceleration (optional)

Build System

CMake 3.22+: Build configuration
vcpkg: Package management
GCC 9+/Clang 10+: C++17 compilation

11. Advanced Features

Signal Handling

SIGINT Handler: Non-destructive interruption for forced rendering
Graceful Shutdown: Proper resource cleanup

Mathematical Precision

Float32: Single precision for neural networks
Normalized Values: All state features normalized to [0,1] or [-1,1]
Stable Training: Target networks prevent divergence

Extensibility

Modular Design: Easy to modify network architecture
Parameter System: Runtime adjustment without recompilation
Visualization Framework: Adaptable to different games

This architecture document provides a comprehensive overview of the Reinforcement Learning Snake implementation, covering the mathematical foundations, algorithmic details, and engineering choices.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CMakeLists.txt		CMakeLists.txt
CUDA_SETUP.md		CUDA_SETUP.md
INSTALL.md		INSTALL.md
LINUX_COMMANDS.md		LINUX_COMMANDS.md
README.md		README.md
USAGE.md		USAGE.md
VCPKG_SETUP.md		VCPKG_SETUP.md
main.cpp		main.cpp

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Snake - Complete Installation Guide

Overview

System Requirements

Hardware Requirements

Software Requirements

Step-by-Step Installation

Step 1: Install System Dependencies

Update Package Manager

Install Essential Build Tools

Install Graphics and Window System Libraries

Install OpenGL and Vulkan Development Headers

Step 2: Install CUDA Toolkit (Optional but Recommended)

Download CUDA Toolkit

Install CUDA (Example for CUDA 11.8)

Configure CUDA Environment

Step 3: Install vcpkg Package Manager

Clone vcpkg Repository

Bootstrap vcpkg

Add vcpkg to System PATH

Step 4: Install Project Dependencies with vcpkg

Install SDL3 (Simple DirectMedia Layer 3)

Install GLM (OpenGL Mathematics Library)

Verify vcpkg Installations

Step 5: Install PyTorch C++ Distribution (LibTorch)

Download LibTorch with CUDA Support

Alternative: CPU-Only Version (if no CUDA GPU)

Step 6: Configure CMake for vcpkg Integration

Create CMake Presets File

Step 7: Build the Project

Create Build Directory

Configure with CMake

Compile the Project

Troubleshooting Common Issues

Issue 1: SDL3 Not Found

Issue 2: CUDA Libraries Not Found

Issue 3: LibTorch Linking Errors

Issue 4: CMake Configuration Fails

Issue 5: Compilation Errors with GCC

Running the Application

Basic Execution

Training Controls

Performance Tuning

Project Structure

Required Libraries Summary

Verification Commands

Verify All Dependencies

Test Run

Next Steps

Support

Project Architecture and Implementation Details

Overview

Core Components

1. Deep Q-Network Architecture

Neural Network Structure

State Representation (16-dimensional vector)

2. Deep Q-Learning Algorithm

Mathematical Foundation

Loss Function

3. Training Algorithm

Main Training Loop

Experience Replay

Target Network

4. Reward Function Design

5. Exploration Strategy

ε-Greedy Policy

Epsilon Decay

6. Game Implementation

Grid System

Snake Representation

Collision Detection

7. Rendering System

Four-Panel Layout

Custom Bitmap Font

Real-time Visualization Features

8. Interactive Parameter Tuning

Adjustable Parameters

Packages