StormCast recipe support for downscaling/nowcasting/unconditional models by jleinonen · Pull Request #1474 · NVIDIA/physicsnemo

jleinonen · 2026-03-05T19:03:34Z

PhysicsNeMo Pull Request

Based partially on feedback by @mariusaurus, this PR ensures that different types of regional forecasting models can be trained with the StormCast recipe and makes their implementation smoother (e.g. by removing the need to use dummy tensors in __getitem__).

Changes:

Add a test in test_training.py to train with different model configurations (downscaling, nowcasting, StormCast-like hybrid models and unconditional diffusion models).
Fix various small issues identified by the testing that prevented some configurations from training.
Eliminate the need to pass dummy tensors from __getitem__: "background" may be omitted by nowcasting models that do not use low-resolution conditioning, while state may be a torch.Tensor rather than a list of two tensors for models that do not do a state update (downscaling and unconditional models).
Update README.md with more information about configuring the recipe to train different types of models.
Update requirements.txt and README.md to note that torch>=2.10 is needed for domain parallelism.
Various other minor improvements to README.md.
Fix ConcatConditionWrapper where an or in an error check should have been and.
Remove StableAdamW support as it doesn't work properly with FSDP even when domain parallelism is not used.

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

greptile-apps · 2026-03-05T19:08:08Z

Greptile Summary

This PR extends the StormCast recipe to support four distinct model types (hybrid StormCast-like, nowcasting, downscaling, and unconditional diffusion) by eliminating the need for dummy tensors in __getitem__, fixing a logical error in ConcatConditionWrapper, and removing the broken StableAdamW optimizer. The changes are well-structured and the new test_model_types test provides good coverage.

Key changes:

utils/nn.py / utils/trainer.py / utils/sampler.py: background and state[0] may now be None; corresponding None-guards were added throughout the forward and sampling paths.
physicsnemo/diffusion/utils/model_wrappers.py: Corrected or → and in the ConcatConditionWrapper error guard so that the error fires only when both condition keys are absent, not when either one is missing.
utils/loss.py: Adds sigma.flatten() to the unconditional branch of EDMLoss, making it consistent with the conditional branch.
utils/optimizers.py / utils/config.py: Removes StableAdamW support as it doesn't work with FSDP/DTensor.
requirements.txt: Replaces torch-optimi[optimi] with torch>=2.10 to reflect the domain-parallelism minimum requirement.
README.md: Comprehensive documentation update with a new "Model types" table and clearer dataset interface specification.
Potential issue in sampler.py: When mean_hr is explicitly non-None and img_lr is None (e.g. an unconditional model with a regression output passed via sampler_args), img_lr.shape[-2:] (line 270) and x_lr.shape[0] (line 275) would both raise AttributeError before the x_lr is not None guard can take effect.

Important Files Changed

Filename	Overview
physicsnemo/diffusion/utils/model_wrappers.py	Correct bug fix: `or` → `and` in the error guard of `ConcatConditionWrapper`; the error should fire only when both `cond_concat` and `cond_vec` are `None`.
examples/weather/stormcast/utils/sampler.py	Added `img_lr: torch.Tensor
examples/weather/stormcast/utils/nn.py	Correctly handles empty condition list (returns `None`), scalar-only TensorDict construction, `batch["state"]` as a single tensor, and missing `"background"` key. `diffusion_model_forward` now accepts explicit `dtype`/`device` to avoid dereferencing a potentially-None `condition`.
examples/weather/stormcast/utils/trainer.py	Adds early validation that scalar conditions are only used with the DiT architecture, passes explicit `dtype`/`device` to `diffusion_model_forward`, and removes StableAdamW references.
examples/weather/stormcast/datasets/mock.py	Adds `model_type` parameter to `_MockDataset` so tests can simulate all four model types. Missing `else` branch in `__getitem__` could cause an `UnboundLocalError` if an unexpected value is passed at runtime.
examples/weather/stormcast/test_training.py	Adds `test_model_types` parametrized over all four model types and two architectures, removes StableAdamW parametrization, and correctly uses `nullcontext`/`pytest.raises` for the scalar-conditions-on-UNet scenario.

_{Last reviewed commit: 1ff4144}

examples/weather/stormcast/utils/sampler.py

examples/weather/stormcast/datasets/mock.py

jleinonen · 2026-03-06T12:42:53Z

/blossom-ci

examples/weather/stormcast/utils/loss.py

examples/weather/stormcast/README.md

pzharrington

Awesome changes!

jleinonen · 2026-03-10T11:23:15Z

/blossom-ci

pzharrington · 2026-03-10T15:11:56Z

/blossom-ci

jleinonen and others added 16 commits February 23, 2026 07:17

Stormcast recipe refactor for domain parallelism

373cb24

Revert conditioning embedder setting

ce2265c

Update README.md

64632c9

Rename ParallelManager to ParallelHelper

90bb452

Clarify sharding usage

46a6787

Fix 'scala' typo

2f39967

Improve documentation of sharded_dataloader and sharded_data_iter

63d98c0

Centralize determination of sharded parameters

3a19949

Fix test failures

7326ec1

Lint, readme polish, rename config

55eb8cf

Merge branch 'main' into stormcast-domain-parallel

5e81a68

Merge branch 'NVIDIA:main' into stormcast-domain-parallel

b361eb4

Proper support for nowcasting/downscaling/unconditional models

759cf9d

Merge branch 'NVIDIA:main' into stormcast-domain-parallel

15b2c8b

Minor fixes to README.md

1ce6414

Update installation instructions in README.md

1ff4144

jleinonen requested a review from pzharrington March 5, 2026 19:03

jleinonen self-assigned this Mar 5, 2026

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

examples/weather/stormcast/utils/sampler.py Outdated Show resolved Hide resolved

examples/weather/stormcast/datasets/mock.py Show resolved Hide resolved

jleinonen and others added 2 commits March 6, 2026 04:40

Add guard for img_lr is None

e032f9a

Merge branch 'NVIDIA:main' into stormcast-domain-parallel

c012ea6

pzharrington reviewed Mar 6, 2026

View reviewed changes

examples/weather/stormcast/utils/loss.py Show resolved Hide resolved

pzharrington reviewed Mar 6, 2026

View reviewed changes

examples/weather/stormcast/README.md Outdated Show resolved Hide resolved

pzharrington approved these changes Mar 6, 2026

View reviewed changes

jleinonen and others added 4 commits March 10, 2026 04:00

Fix img_lr is None case

9dbb477

Fix Greptile complaint about test if-else

d0a08b1

Drop StormCast from README title

677aa41

Merge branch 'NVIDIA:main' into stormcast-domain-parallel

6104b37

Merge branch 'main' into stormcast-domain-parallel

2b741f5

pzharrington enabled auto-merge March 10, 2026 15:12

pzharrington added this pull request to the merge queue Mar 10, 2026

Merged via the queue into NVIDIA:main with commit 7ff39a7 Mar 10, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StormCast recipe support for downscaling/nowcasting/unconditional models#1474

StormCast recipe support for downscaling/nowcasting/unconditional models#1474
pzharrington merged 23 commits intoNVIDIA:mainfrom
jleinonen:stormcast-domain-parallel

jleinonen commented Mar 5, 2026

Uh oh!

greptile-apps bot commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

jleinonen commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

pzharrington left a comment

Uh oh!

jleinonen commented Mar 10, 2026

Uh oh!

pzharrington commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jleinonen commented Mar 5, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

greptile-apps bot commented Mar 5, 2026

Greptile Summary

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

jleinonen commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

pzharrington left a comment

Choose a reason for hiding this comment

Uh oh!

jleinonen commented Mar 10, 2026

Uh oh!

pzharrington commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants