Conversation
Adds per-core power tracking alongside the existing flat node-level model, without changing the reward signal. The flat model charges COST_USED_MW for any used node regardless of utilization; the proportional model scales the compute portion linearly from COST_IDLE_MW (0 cores) to COST_USED_MW (96 cores), i.e. COST_IDLE + (COST_USED - COST_IDLE) * (cores_used / CORES_PER_NODE). New metrics (eval log + TensorBoard): - PropPower (MWh): agent/base/base_off under proportional model - metrics/prop_power_mwh, baseline_prop_power_mwh, baseline_off_prop_power_mwh - metrics/savings_prop_power_vs_baseline_off (MWh gap vs baseline_off) - metrics/savings_prop_cost_vs_baseline_off (€ gap vs baseline_off, directly comparable to metrics/savings_off) All proportional values are computed post-hoc from already-tracked per-step lists (episode_on_nodes, episode_used_cores, episode_price_stats), so no per-step changes to power_cost() or power_consumption_mwh() were needed.
- introduce new reward helpers that use per-node core utilization instead of node-count proxies - compute efficiency from delivered core-hours per MWh with smooth target-based normalization - compute price timing reward from total used cores so it depends on useful work, not packing shape - keep legacy reward functions untouched and switch calculate() to the new terms when node state is available - add regression tests covering packing-sensitive efficiency and packing-invariant price reward
📝 WalkthroughWalkthroughThe pull request introduces proportional (per-core) power and cost metrics throughout the codebase. New metric calculations are added to the metrics tracker, logged in callbacks during training, and appended to episode summary output. Additionally, the reward calculation system is extended with utilization-aware efficiency rewards and cores-based price reward functions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/reward_calculation.py (2)
42-51: Consider extracting shared core-counting logic.
_used_cores_from_stateduplicates the core calculation logic from_power_consumption_from_state_mwh(lines 22-28). Consider extracting the common mask/clip pattern into a shared helper to avoid drift if the calculation changes.♻️ Optional: Extract shared logic
+def _compute_used_cores(nodes: np.ndarray, cores_available: np.ndarray) -> np.ndarray: + """Compute used cores per node with masking and clipping.""" + on_mask = nodes >= 0 + used_cores = np.zeros_like(cores_available, dtype=float) + if np.any(on_mask): + used_cores[on_mask] = CORES_PER_NODE - cores_available[on_mask] + used_cores = np.clip(used_cores, 0.0, float(CORES_PER_NODE)) + return used_cores + + def _used_cores_from_state(nodes: np.ndarray, cores_available: np.ndarray) -> float: """Calculate the total number of used cores across all powered nodes.""" - on_mask = nodes >= 0 - if not np.any(on_mask): - return 0.0 - - used_cores = np.zeros_like(cores_available, dtype=float) - used_cores[on_mask] = CORES_PER_NODE - cores_available[on_mask] - used_cores = np.clip(used_cores, 0.0, float(CORES_PER_NODE)) - return float(np.sum(used_cores)) + return float(np.sum(_compute_used_cores(nodes, cores_available)))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reward_calculation.py` around lines 42 - 51, Both _used_cores_from_state and _power_consumption_from_state_mwh duplicate the mask/clip core-counting logic; extract a small helper (e.g., _compute_used_cores_mask or similar) that accepts nodes, cores_available and CORES_PER_NODE and returns the per-node used cores (already masked and clipped) so both _used_cores_from_state and _power_consumption_from_state_mwh call that helper and then sum/convert as needed; ensure the helper signature and returned dtype match existing uses and replace the duplicated lines in both functions (referencing nodes, cores_available, CORES_PER_NODE).
333-374: Consider consolidating with_reward_price.
_reward_price_utilizationshares substantial logic with_reward_price(lines 213-258). Both implement the same saturation curve, overdrive handling, and floor logic. A unified helper accepting a "work volume" parameter would reduce duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reward_calculation.py` around lines 333 - 374, Extract the shared saturation/overdrive logic from _reward_price_utilization and _reward_price into a single helper (e.g. _price_reward_with_volume(self, current_price: float, average_future_price: float, work_volume: float) -> float) that: computes context_avg via self._price_context_average, uses self.prices.MAX_PRICE / MIN_PRICE and price_span, computes relative_advantage and advantage_component with self.PRICE_ADVANTAGE_GAIN, converts work_volume to equivalent_used_nodes using CORES_PER_NODE, selects tau using self.PRICE_NODE_TAU_POS / PRICE_NODE_TAU_NEG, computes load_component = 1 - exp(-equivalent_used_nodes / tau) and raw_reward = advantage_component * load_component, then applies the negative-price overdrive path and tanh saturation exactly as in both functions (preserving NEGATIVE_PRICE_TAU, NEGATIVE_PRICE_NODE_TAU, NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE, NEGATIVE_PRICE_OVERDRIVE_GAIN, NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD, NEGATIVE_PRICE_OVERDRIVE_FLOOR) and returns float. Replace _reward_price_utilization and _reward_price to call this helper: pass used_cores for work_volume in _reward_price_utilization and the appropriate work volume (1.0 or equivalent) in _reward_price, keeping all existing behavior and types.src/metrics_tracker.py (1)
180-207: Addstrict=Truetozip()calls for robustness.The episode lists (
episode_on_nodes,episode_used_cores,episode_price_stats, etc.) should always have the same length. Addingstrict=Truewould catch any length mismatch bugs early rather than silently truncating the sum.♻️ Add strict=True
agent_prop_power_mwh: float = sum( COST_IDLE_MW * on + _compute_delta_mw * (cores / CORES_PER_NODE) - for on, cores in zip(self.episode_on_nodes, self.episode_used_cores) + for on, cores in zip(self.episode_on_nodes, self.episode_used_cores, strict=True) )Apply similarly to lines 196, 201, and 205.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/metrics_tracker.py` around lines 180 - 207, The zip() calls that aggregate episode lists (used in computing agent_prop_power_mwh, baseline_prop_power_mwh, baseline_off_prop_power_mwh, agent_prop_cost, and baseline_off_prop_cost) should use strict=True to surface length mismatches instead of silently truncating; update each zip(...) to zip(..., strict=True) for the comprehensions that iterate over self.episode_on_nodes, self.episode_used_cores, self.episode_baseline_used_cores, self.episode_baseline_used_nodes, and self.episode_price_stats so mismatched lengths raise an error early.src/callbacks.py (1)
61-87: Code duplication withmetrics_tracker.py.These proportional power formulas (lines 63-74, 79-86) are duplicated verbatim in
metrics_tracker.py(lines 184-206). While currently consistent, any future formula change must be applied in both locations. Consider extracting to a shared utility or computing only inmetrics_tracker.pyand accessing the result here.Additionally, the static analyzer flags
zip()calls withoutstrict=True. Since these lists should always have equal length, addingstrict=Truewould catch length mismatches early.♻️ Add strict=True to zip calls
agent_prop_power = sum( COST_IDLE_MW * on + _delta * (cores / CORES_PER_NODE) - for on, cores in zip(env.metrics.episode_on_nodes, env.metrics.episode_used_cores) + for on, cores in zip(env.metrics.episode_on_nodes, env.metrics.episode_used_cores, strict=True) ) baseline_prop_power = sum( COST_IDLE_MW * MAX_NODES + _delta * (cores / CORES_PER_NODE) for cores in env.metrics.episode_baseline_used_cores ) baseline_off_prop_power = sum( COST_IDLE_MW * used + _delta * (cores / CORES_PER_NODE) - for used, cores in zip(env.metrics.episode_baseline_used_nodes, env.metrics.episode_baseline_used_cores) + for used, cores in zip(env.metrics.episode_baseline_used_nodes, env.metrics.episode_baseline_used_cores, strict=True) )Apply similar changes to lines 81 and 85.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/callbacks.py` around lines 61 - 87, The proportional power/cost calculations (agent_prop_power, baseline_prop_power, baseline_off_prop_power, agent_prop_cost, baseline_off_prop_cost) are duplicated with metrics_tracker.py; extract them into a single shared utility (e.g., a function named compute_proportional_power_metrics or a method on the metrics object) and call that from both locations instead of duplicating logic, and when consuming parallel lists (episode_on_nodes, episode_used_cores, episode_price_stats, episode_baseline_used_nodes, episode_baseline_used_cores) change zip(...) calls to zip(..., strict=True) to enforce equal-length lists and catch mismatches early. Ensure the shared function returns the same named results and update the logger calls (metrics/prop_power_mwh, metrics/baseline_prop_power_mwh, metrics/baseline_off_prop_power_mwh, metrics/savings_prop_power_vs_baseline_off, metrics/savings_prop_cost_vs_baseline_off) to use those returned values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/callbacks.py`:
- Around line 61-87: The proportional power/cost calculations (agent_prop_power,
baseline_prop_power, baseline_off_prop_power, agent_prop_cost,
baseline_off_prop_cost) are duplicated with metrics_tracker.py; extract them
into a single shared utility (e.g., a function named
compute_proportional_power_metrics or a method on the metrics object) and call
that from both locations instead of duplicating logic, and when consuming
parallel lists (episode_on_nodes, episode_used_cores, episode_price_stats,
episode_baseline_used_nodes, episode_baseline_used_cores) change zip(...) calls
to zip(..., strict=True) to enforce equal-length lists and catch mismatches
early. Ensure the shared function returns the same named results and update the
logger calls (metrics/prop_power_mwh, metrics/baseline_prop_power_mwh,
metrics/baseline_off_prop_power_mwh, metrics/savings_prop_power_vs_baseline_off,
metrics/savings_prop_cost_vs_baseline_off) to use those returned values.
In `@src/metrics_tracker.py`:
- Around line 180-207: The zip() calls that aggregate episode lists (used in
computing agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, and baseline_off_prop_cost) should
use strict=True to surface length mismatches instead of silently truncating;
update each zip(...) to zip(..., strict=True) for the comprehensions that
iterate over self.episode_on_nodes, self.episode_used_cores,
self.episode_baseline_used_cores, self.episode_baseline_used_nodes, and
self.episode_price_stats so mismatched lengths raise an error early.
In `@src/reward_calculation.py`:
- Around line 42-51: Both _used_cores_from_state and
_power_consumption_from_state_mwh duplicate the mask/clip core-counting logic;
extract a small helper (e.g., _compute_used_cores_mask or similar) that accepts
nodes, cores_available and CORES_PER_NODE and returns the per-node used cores
(already masked and clipped) so both _used_cores_from_state and
_power_consumption_from_state_mwh call that helper and then sum/convert as
needed; ensure the helper signature and returned dtype match existing uses and
replace the duplicated lines in both functions (referencing nodes,
cores_available, CORES_PER_NODE).
- Around line 333-374: Extract the shared saturation/overdrive logic from
_reward_price_utilization and _reward_price into a single helper (e.g.
_price_reward_with_volume(self, current_price: float, average_future_price:
float, work_volume: float) -> float) that: computes context_avg via
self._price_context_average, uses self.prices.MAX_PRICE / MIN_PRICE and
price_span, computes relative_advantage and advantage_component with
self.PRICE_ADVANTAGE_GAIN, converts work_volume to equivalent_used_nodes using
CORES_PER_NODE, selects tau using self.PRICE_NODE_TAU_POS / PRICE_NODE_TAU_NEG,
computes load_component = 1 - exp(-equivalent_used_nodes / tau) and raw_reward =
advantage_component * load_component, then applies the negative-price overdrive
path and tanh saturation exactly as in both functions (preserving
NEGATIVE_PRICE_TAU, NEGATIVE_PRICE_NODE_TAU,
NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE, NEGATIVE_PRICE_OVERDRIVE_GAIN,
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD, NEGATIVE_PRICE_OVERDRIVE_FLOOR) and returns
float. Replace _reward_price_utilization and _reward_price to call this helper:
pass used_cores for work_volume in _reward_price_utilization and the appropriate
work volume (1.0 or equivalent) in _reward_price, keeping all existing behavior
and types.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 45f0f9ca-363e-4608-beec-aa03db755f76
📒 Files selected for processing (4)
src/callbacks.pysrc/evaluation_summary.pysrc/metrics_tracker.pysrc/reward_calculation.py
| class RewardCalculator: | ||
|
|
||
| """Calculates rewards with pre-computed normalization bounds.""" | ||
| EFFICIENCY_TARGET_RATIO = 0.70 |
No description provided.