From 4d6ac81ed0375b2ccab6c40cc6166d8eddbdf957 Mon Sep 17 00:00:00 2001 From: Valtteri Valo Date: Tue, 17 Feb 2026 22:16:27 +0200 Subject: [PATCH] fix division by zero in anneal_beta when total_timesteps < batch_size When total_timesteps is smaller than one batch (total_agents * horizon), total_epochs = total_timesteps / batch_size evaluates to 0. The anneal_beta formula then computes current_epoch / total_epochs = 0 / 0, which is NaN in IEEE 754. This NaN propagates through priority replay weights (mb_prio) into the loss, silently producing NaN for all losses. cosine_annealing() already guards against this (line 714: if T == 0 return lr_base), but the anneal_beta computation on the next line does not. Clamping total_epochs to at least 1 fixes both. --- pufferlib/extensions/pufferlib.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/pufferlib/extensions/pufferlib.cpp b/pufferlib/extensions/pufferlib.cpp index fadd2858e..675f4d75d 100644 --- a/pufferlib/extensions/pufferlib.cpp +++ b/pufferlib/extensions/pufferlib.cpp @@ -366,6 +366,7 @@ void train_impl(PuffeRL& pufferl) { Muon* muon = pufferl.muon; int total_epochs = hypers.total_timesteps / batch_size; + if (total_epochs < 1) total_epochs = 1; if (anneal_lr) { float lr_min = hypers.min_lr_ratio * hypers.lr;