add weighted evaluation aggregation by janrth · Pull Request #243 · Nixtla/utilsforecast

janrth · 2026-05-14T12:48:51Z

Summary

Adds weighted aggregation support to evaluate while keeping the core loss functions unchanged.

The new agg_fn="weighted_mean" option computes metrics at the existing per-series level first, then aggregates them with user-provided or automatically computed weights. This preserves the current loss semantics and avoids pushing aggregation logic into every loss function.

Changes

Added weights argument to evaluate.
Added agg_fn="weighted_mean".
Supports explicit weights with:
- unique_id, weight
- unique_id, cutoff, weight
Supports weights="auto", which uses validation target volume:
- sum(y) per unique_id
- sum(y) per (cutoff, unique_id) when cross-validation cutoffs are present
Added validation for missing, duplicated, or malformed weights.
Added pandas and polars test coverage.

Notes

Weighted aggregation is applied after each metric has been computed per series. For AutoML tuning in MLForecast, this means each CV fold can compute a weighted score, and MLForecast then averages those fold-level scores.

Solves #241

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7be2207ce5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nasaul · 2026-05-21T00:44:54Z

Review Summary

Overall the design is clean and the narwhals-based implementation fits the project well. A few items to address before merge:

Correctness

Distributed path silently drops weights. _distributed_evaluate (evaluation.py:336) isn't updated to forward weights. Users will hit the generic "agg_fn is not supported in distributed" error without realizing weights were ignored. Please raise an explicit NotImplementedError for agg_fn=\"weighted_mean\" on the distributed branch.
Undocumented cutoff broadcast. When df has cutoff_col but the weights dataframe doesn't, the same weight is broadcast across all cutoffs (evaluation.py:99-100, 143-144). Either document this in the docstring or require an opt-in.
Auto-weights semantics. The docstring says "sum of the target over each series forecast horizon" — please emphasize this uses the forecast/validation window (not training), since small true values in the eval window produce small weights.

Code quality

Redundant finite check after the join (evaluation.py:149-150) — _get_weights already calls _check_weights_are_finite; only the null check is needed.
Rename weights_df_source (evaluation.py:353, 467) → forecasts_df / original_df.
Replace the hand-rolled group_by(...).agg(len) > 1 duplicate check (evaluation.py:106-112) with unique().shape[0] != shape[0].
Fold is_null().any() and (~is_finite()).any() into a single boolean expression to scan the column once.
PEP 8 nit: id_cols[1:] # exclude id_col → two spaces before #.

Performance

Collapse _weighted_mean_agg into one group_by. The intermediate __utilsforecast_weighted_{i} columns (evaluation.py:152-158) and the post-aggregation divide step (evaluation.py:169-175) can be merged:
```python
df_nw = df_nw.group_by(*group_cols).agg(
nw.col(_WEIGHT_COL).sum().alias(_WEIGHT_SUM_COL),
*[(nw.col(model) * nw.col(_WEIGHT_COL)).sum().alias(model) for model in model_cols],
)
```
Then divide once — saves two with_columns passes over wide model sets.
The repeated .any() scans for null/finite/zero-sum can be reduced; consider folding the zero-sum check into the divide step.

Tests

Coverage is good, but please add:

Weights without cutoff_col while df has cutoff_col (broadcast case).
Zero-sum-weights case (the raise at evaluation.py:166 is uncovered).
Cross-engine smoke test (polars df + pandas weights and vice versa).

Suggested order

Explicit error for weights on distributed path.
Single-pass _weighted_mean_agg.
Docstring updates (broadcast + auto-weights semantics).
Drop redundant finite check; merge null/finite scans.
Add missing tests.

review-notebook-app · 2026-05-21T18:49:46Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

janrth · 2026-05-21T18:59:27Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52edbdeb2d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

janrth · 2026-05-21T19:37:10Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 586db014df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T19:42:19Z

+    df_nw = df_nw.group_by(*group_cols).agg(
+        nw.col(_WEIGHT_COL).sum().alias(_WEIGHT_SUM_COL),
+        *[


Exclude missing metric rows from weighted denominator

When any metric output contains missing values (a common case for metrics like mape on zero targets), this aggregation computes the denominator from sum(weight) across all series but computes each numerator from sum(weight * metric). That means rows with missing metric values still contribute to the denominator, so the weighted score is biased downward (and can become 0 instead of NaN for all-missing groups). This differs from the existing agg_fn='mean' behavior, which excludes missing values per column; the weighted path should similarly use per-model effective weight sums over non-missing metric rows.

Useful? React with 👍 / 👎.

add weighted evaluation aggregation

7be2207

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Comment thread utilsforecast/evaluation.py

janrth and others added 2 commits May 14, 2026 15:12

account for nan in weights

c618e26

Merge branch 'main' into 241/weights_for_eval

ea07bf3

implement reviewer feedback

780bb05

Merge branch 'main' into 241/weights_for_eval

52edbde

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Comment thread utilsforecast/evaluation.py

move weights param

586db01

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add weighted evaluation aggregation#243

add weighted evaluation aggregation#243
janrth wants to merge 6 commits into
Nixtla:mainfrom
janrth:241/weights_for_eval

janrth commented May 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

nasaul commented May 21, 2026

Uh oh!

review-notebook-app Bot commented May 21, 2026

Uh oh!

janrth commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

janrth commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janrth commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nasaul commented May 21, 2026

Review Summary

Correctness

Code quality

Performance

Tests

Suggested order

Uh oh!

review-notebook-app Bot commented May 21, 2026

Uh oh!

janrth commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

janrth commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janrth commented May 14, 2026 •

edited

Loading