add weighted evaluation aggregation#243
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7be2207ce5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Review SummaryOverall the design is clean and the narwhals-based implementation fits the project well. A few items to address before merge: Correctness
Code quality
Performance
TestsCoverage is good, but please add:
Suggested order
|
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52edbdeb2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 586db014df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| df_nw = df_nw.group_by(*group_cols).agg( | ||
| nw.col(_WEIGHT_COL).sum().alias(_WEIGHT_SUM_COL), | ||
| *[ |
There was a problem hiding this comment.
Exclude missing metric rows from weighted denominator
When any metric output contains missing values (a common case for metrics like mape on zero targets), this aggregation computes the denominator from sum(weight) across all series but computes each numerator from sum(weight * metric). That means rows with missing metric values still contribute to the denominator, so the weighted score is biased downward (and can become 0 instead of NaN for all-missing groups). This differs from the existing agg_fn='mean' behavior, which excludes missing values per column; the weighted path should similarly use per-model effective weight sums over non-missing metric rows.
Useful? React with 👍 / 👎.
Summary
Adds weighted aggregation support to
evaluatewhile keeping the core loss functions unchanged.The new
agg_fn="weighted_mean"option computes metrics at the existing per-series level first, then aggregates them with user-provided or automatically computed weights. This preserves the current loss semantics and avoids pushing aggregation logic into every loss function.Changes
weightsargument toevaluate.agg_fn="weighted_mean".unique_id, weightunique_id, cutoff, weightweights="auto", which uses validation target volume:sum(y)perunique_idsum(y)per(cutoff, unique_id)when cross-validation cutoffs are presentNotes
Weighted aggregation is applied after each metric has been computed per series. For AutoML tuning in MLForecast, this means each CV fold can compute a weighted score, and MLForecast then averages those fold-level scores.
Solves #241