Pr1 reward by enlorenz · Pull Request #40 · FairRootGroup/powersched

enlorenz · 2026-03-24T11:58:37Z

No description provided.

Adds per-core power tracking alongside the existing flat node-level model, without changing the reward signal. The flat model charges COST_USED_MW for any used node regardless of utilization; the proportional model scales the compute portion linearly from COST_IDLE_MW (0 cores) to COST_USED_MW (96 cores), i.e. COST_IDLE + (COST_USED - COST_IDLE) * (cores_used / CORES_PER_NODE). New metrics (eval log + TensorBoard): - PropPower (MWh): agent/base/base_off under proportional model - metrics/prop_power_mwh, baseline_prop_power_mwh, baseline_off_prop_power_mwh - metrics/savings_prop_power_vs_baseline_off (MWh gap vs baseline_off) - metrics/savings_prop_cost_vs_baseline_off (€ gap vs baseline_off, directly comparable to metrics/savings_off) All proportional values are computed post-hoc from already-tracked per-step lists (episode_on_nodes, episode_used_cores, episode_price_stats), so no per-step changes to power_cost() or power_consumption_mwh() were needed.

- introduce new reward helpers that use per-node core utilization instead of node-count proxies - compute efficiency from delivered core-hours per MWh with smooth target-based normalization - compute price timing reward from total used cores so it depends on useful work, not packing shape - keep legacy reward functions untouched and switch calculate() to the new terms when node state is available - add regression tests covering packing-sensitive efficiency and packing-invariant price reward

coderabbitai · 2026-03-24T11:58:53Z

📝 Walkthrough

Walkthrough

The pull request introduces proportional (per-core) power and cost metrics throughout the codebase. New metric calculations are added to the metrics tracker, logged in callbacks during training, and appended to episode summary output. Additionally, the reward calculation system is extended with utilization-aware efficiency rewards and cores-based price reward functions.

Changes

Cohort / File(s)	Summary
Metrics Computation `src/metrics_tracker.py`	Extended `record_episode_completion()` to compute per-core proportional power and cost metrics for agent, baseline, and "baseline off" variants using imported config values (`COST_IDLE_MW`, `COST_USED_MW`, `CORES_PER_NODE`, `MAX_NODES`). New metrics added to returned episode data dict.
Metrics Logging and Display `src/callbacks.py`, `src/evaluation_summary.py`	Extended imports in callbacks to include new config constants. Added metrics logging block in `_on_step` computing and recording proportional power and savings metrics. Extended episode summary output in evaluation summary to include formatted `PropPower` field with agent and baseline values.
Reward Calculation `src/reward_calculation.py`	Added utility function `_used_cores_from_state()` to derive used cores from per-node state. Introduced new reward functions: `_reward_energy_efficiency_utilization_normalized()` for efficiency scoring via delivered core-hours per MWh, and `_reward_price_utilization()` replacing node-count-based price reward with cores-volume-scaled variant. Updated `calculate()` control flow to use new utilization-aware functions when state arrays provided.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

RewardCalculator #6: Refactors RewardCalculator and modifies reward function implementations in reward_calculation.py, directly related to the new reward functions introduced in this PR.
Pr1 price reward #31: Modifies reward_calculation.py's price and efficiency reward logic to use alternative usage measurement approaches, overlapping with the cores-based reward functions added here.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is vague and does not clearly describe the main changes. 'Pr1 reward' lacks specificity and does not convey meaningful information about what was changed.	Use a more descriptive title that clearly explains the main change, such as 'Add proportional power metrics and improve reward calculation with utilization-awareness'.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess whether it relates to the changeset.	Add a description explaining the purpose of the changes, the new metrics and reward functions introduced, and how they improve the system.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

src/reward_calculation.py (2)

42-51: Consider extracting shared core-counting logic.

_used_cores_from_state duplicates the core calculation logic from _power_consumption_from_state_mwh (lines 22-28). Consider extracting the common mask/clip pattern into a shared helper to avoid drift if the calculation changes.

♻️ Optional: Extract shared logic

+def _compute_used_cores(nodes: np.ndarray, cores_available: np.ndarray) -> np.ndarray:
+    """Compute used cores per node with masking and clipping."""
+    on_mask = nodes >= 0
+    used_cores = np.zeros_like(cores_available, dtype=float)
+    if np.any(on_mask):
+        used_cores[on_mask] = CORES_PER_NODE - cores_available[on_mask]
+        used_cores = np.clip(used_cores, 0.0, float(CORES_PER_NODE))
+    return used_cores
+
+
 def _used_cores_from_state(nodes: np.ndarray, cores_available: np.ndarray) -> float:
     """Calculate the total number of used cores across all powered nodes."""
-    on_mask = nodes >= 0
-    if not np.any(on_mask):
-        return 0.0
-
-    used_cores = np.zeros_like(cores_available, dtype=float)
-    used_cores[on_mask] = CORES_PER_NODE - cores_available[on_mask]
-    used_cores = np.clip(used_cores, 0.0, float(CORES_PER_NODE))
-    return float(np.sum(used_cores))
+    return float(np.sum(_compute_used_cores(nodes, cores_available)))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 42 - 51, Both _used_cores_from_state
and _power_consumption_from_state_mwh duplicate the mask/clip core-counting
logic; extract a small helper (e.g., _compute_used_cores_mask or similar) that
accepts nodes, cores_available and CORES_PER_NODE and returns the per-node used
cores (already masked and clipped) so both _used_cores_from_state and
_power_consumption_from_state_mwh call that helper and then sum/convert as
needed; ensure the helper signature and returned dtype match existing uses and
replace the duplicated lines in both functions (referencing nodes,
cores_available, CORES_PER_NODE).

333-374: Consider consolidating with _reward_price.

_reward_price_utilization shares substantial logic with _reward_price (lines 213-258). Both implement the same saturation curve, overdrive handling, and floor logic. A unified helper accepting a "work volume" parameter would reduce duplication.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 333 - 374, Extract the shared
saturation/overdrive logic from _reward_price_utilization and _reward_price into
a single helper (e.g. _price_reward_with_volume(self, current_price: float,
average_future_price: float, work_volume: float) -> float) that: computes
context_avg via self._price_context_average, uses self.prices.MAX_PRICE /
MIN_PRICE and price_span, computes relative_advantage and advantage_component
with self.PRICE_ADVANTAGE_GAIN, converts work_volume to equivalent_used_nodes
using CORES_PER_NODE, selects tau using self.PRICE_NODE_TAU_POS /
PRICE_NODE_TAU_NEG, computes load_component = 1 - exp(-equivalent_used_nodes /
tau) and raw_reward = advantage_component * load_component, then applies the
negative-price overdrive path and tanh saturation exactly as in both functions
(preserving NEGATIVE_PRICE_TAU, NEGATIVE_PRICE_NODE_TAU,
NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE, NEGATIVE_PRICE_OVERDRIVE_GAIN,
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD, NEGATIVE_PRICE_OVERDRIVE_FLOOR) and returns
float. Replace _reward_price_utilization and _reward_price to call this helper:
pass used_cores for work_volume in _reward_price_utilization and the appropriate
work volume (1.0 or equivalent) in _reward_price, keeping all existing behavior
and types.

src/metrics_tracker.py (1)

180-207: Add strict=True to zip() calls for robustness.

The episode lists (episode_on_nodes, episode_used_cores, episode_price_stats, etc.) should always have the same length. Adding strict=True would catch any length mismatch bugs early rather than silently truncating the sum.

♻️ Add strict=True

         agent_prop_power_mwh: float = sum(
             COST_IDLE_MW * on + _compute_delta_mw * (cores / CORES_PER_NODE)
-            for on, cores in zip(self.episode_on_nodes, self.episode_used_cores)
+            for on, cores in zip(self.episode_on_nodes, self.episode_used_cores, strict=True)
         )

Apply similarly to lines 196, 201, and 205.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/metrics_tracker.py` around lines 180 - 207, The zip() calls that
aggregate episode lists (used in computing agent_prop_power_mwh,
baseline_prop_power_mwh, baseline_off_prop_power_mwh, agent_prop_cost, and
baseline_off_prop_cost) should use strict=True to surface length mismatches
instead of silently truncating; update each zip(...) to zip(..., strict=True)
for the comprehensions that iterate over self.episode_on_nodes,
self.episode_used_cores, self.episode_baseline_used_cores,
self.episode_baseline_used_nodes, and self.episode_price_stats so mismatched
lengths raise an error early.

src/callbacks.py (1)

61-87: Code duplication with metrics_tracker.py.

These proportional power formulas (lines 63-74, 79-86) are duplicated verbatim in metrics_tracker.py (lines 184-206). While currently consistent, any future formula change must be applied in both locations. Consider extracting to a shared utility or computing only in metrics_tracker.py and accessing the result here.

Additionally, the static analyzer flags zip() calls without strict=True. Since these lists should always have equal length, adding strict=True would catch length mismatches early.

♻️ Add strict=True to zip calls

             agent_prop_power = sum(
                 COST_IDLE_MW * on + _delta * (cores / CORES_PER_NODE)
-                for on, cores in zip(env.metrics.episode_on_nodes, env.metrics.episode_used_cores)
+                for on, cores in zip(env.metrics.episode_on_nodes, env.metrics.episode_used_cores, strict=True)
             )
             baseline_prop_power = sum(
                 COST_IDLE_MW * MAX_NODES + _delta * (cores / CORES_PER_NODE)
                 for cores in env.metrics.episode_baseline_used_cores
             )
             baseline_off_prop_power = sum(
                 COST_IDLE_MW * used + _delta * (cores / CORES_PER_NODE)
-                for used, cores in zip(env.metrics.episode_baseline_used_nodes, env.metrics.episode_baseline_used_cores)
+                for used, cores in zip(env.metrics.episode_baseline_used_nodes, env.metrics.episode_baseline_used_cores, strict=True)
             )

Apply similar changes to lines 81 and 85.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/callbacks.py` around lines 61 - 87, The proportional power/cost
calculations (agent_prop_power, baseline_prop_power, baseline_off_prop_power,
agent_prop_cost, baseline_off_prop_cost) are duplicated with metrics_tracker.py;
extract them into a single shared utility (e.g., a function named
compute_proportional_power_metrics or a method on the metrics object) and call
that from both locations instead of duplicating logic, and when consuming
parallel lists (episode_on_nodes, episode_used_cores, episode_price_stats,
episode_baseline_used_nodes, episode_baseline_used_cores) change zip(...) calls
to zip(..., strict=True) to enforce equal-length lists and catch mismatches
early. Ensure the shared function returns the same named results and update the
logger calls (metrics/prop_power_mwh, metrics/baseline_prop_power_mwh,
metrics/baseline_off_prop_power_mwh, metrics/savings_prop_power_vs_baseline_off,
metrics/savings_prop_cost_vs_baseline_off) to use those returned values.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/callbacks.py`:
- Around line 61-87: The proportional power/cost calculations (agent_prop_power,
baseline_prop_power, baseline_off_prop_power, agent_prop_cost,
baseline_off_prop_cost) are duplicated with metrics_tracker.py; extract them
into a single shared utility (e.g., a function named
compute_proportional_power_metrics or a method on the metrics object) and call
that from both locations instead of duplicating logic, and when consuming
parallel lists (episode_on_nodes, episode_used_cores, episode_price_stats,
episode_baseline_used_nodes, episode_baseline_used_cores) change zip(...) calls
to zip(..., strict=True) to enforce equal-length lists and catch mismatches
early. Ensure the shared function returns the same named results and update the
logger calls (metrics/prop_power_mwh, metrics/baseline_prop_power_mwh,
metrics/baseline_off_prop_power_mwh, metrics/savings_prop_power_vs_baseline_off,
metrics/savings_prop_cost_vs_baseline_off) to use those returned values.

In `@src/metrics_tracker.py`:
- Around line 180-207: The zip() calls that aggregate episode lists (used in
computing agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, and baseline_off_prop_cost) should
use strict=True to surface length mismatches instead of silently truncating;
update each zip(...) to zip(..., strict=True) for the comprehensions that
iterate over self.episode_on_nodes, self.episode_used_cores,
self.episode_baseline_used_cores, self.episode_baseline_used_nodes, and
self.episode_price_stats so mismatched lengths raise an error early.

In `@src/reward_calculation.py`:
- Around line 42-51: Both _used_cores_from_state and
_power_consumption_from_state_mwh duplicate the mask/clip core-counting logic;
extract a small helper (e.g., _compute_used_cores_mask or similar) that accepts
nodes, cores_available and CORES_PER_NODE and returns the per-node used cores
(already masked and clipped) so both _used_cores_from_state and
_power_consumption_from_state_mwh call that helper and then sum/convert as
needed; ensure the helper signature and returned dtype match existing uses and
replace the duplicated lines in both functions (referencing nodes,
cores_available, CORES_PER_NODE).
- Around line 333-374: Extract the shared saturation/overdrive logic from
_reward_price_utilization and _reward_price into a single helper (e.g.
_price_reward_with_volume(self, current_price: float, average_future_price:
float, work_volume: float) -> float) that: computes context_avg via
self._price_context_average, uses self.prices.MAX_PRICE / MIN_PRICE and
price_span, computes relative_advantage and advantage_component with
self.PRICE_ADVANTAGE_GAIN, converts work_volume to equivalent_used_nodes using
CORES_PER_NODE, selects tau using self.PRICE_NODE_TAU_POS / PRICE_NODE_TAU_NEG,
computes load_component = 1 - exp(-equivalent_used_nodes / tau) and raw_reward =
advantage_component * load_component, then applies the negative-price overdrive
path and tanh saturation exactly as in both functions (preserving
NEGATIVE_PRICE_TAU, NEGATIVE_PRICE_NODE_TAU,
NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE, NEGATIVE_PRICE_OVERDRIVE_GAIN,
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD, NEGATIVE_PRICE_OVERDRIVE_FLOOR) and returns
float. Replace _reward_price_utilization and _reward_price to call this helper:
pass used_cores for work_volume in _reward_price_utilization and the appropriate
work volume (1.0 or equivalent) in _reward_price, keeping all existing behavior
and types.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 45f0f9ca-363e-4608-beec-aa03db755f76

📥 Commits

Reviewing files that changed from the base of the PR and between 937cb2d and e1e8b55.

📒 Files selected for processing (4)

src/callbacks.py
src/evaluation_summary.py
src/metrics_tracker.py
src/reward_calculation.py

rbx · 2026-03-24T13:09:43Z

src/reward_calculation.py

 class RewardCalculator:

    """Calculates rewards with pre-computed normalization bounds."""
+    EFFICIENCY_TARGET_RATIO = 0.70


Why .7 here?

rbx and others added 2 commits March 24, 2026 12:52

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

rbx reviewed Mar 24, 2026

View reviewed changes

rbx merged commit c31e453 into FairRootGroup:master Mar 24, 2026
4 checks passed

This was referenced Mar 24, 2026

Simplify power functions to scalar interface #41

Merged

Pr2 drop #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr1 reward#40

Pr1 reward#40
rbx merged 2 commits intoFairRootGroup:masterfrom
enlorenz:pr1_reward

enlorenz commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (2 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

rbx Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

enlorenz commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (2 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rbx Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 24, 2026 •

edited

Loading