Skip to content

StormCast recipe support for downscaling/nowcasting/unconditional models#1474

Merged
pzharrington merged 23 commits intoNVIDIA:mainfrom
jleinonen:stormcast-domain-parallel
Mar 10, 2026
Merged

StormCast recipe support for downscaling/nowcasting/unconditional models#1474
pzharrington merged 23 commits intoNVIDIA:mainfrom
jleinonen:stormcast-domain-parallel

Conversation

@jleinonen
Copy link
Collaborator

PhysicsNeMo Pull Request

Based partially on feedback by @mariusaurus, this PR ensures that different types of regional forecasting models can be trained with the StormCast recipe and makes their implementation smoother (e.g. by removing the need to use dummy tensors in __getitem__).

Changes:

  1. Add a test in test_training.py to train with different model configurations (downscaling, nowcasting, StormCast-like hybrid models and unconditional diffusion models).
  2. Fix various small issues identified by the testing that prevented some configurations from training.
  3. Eliminate the need to pass dummy tensors from __getitem__: "background" may be omitted by nowcasting models that do not use low-resolution conditioning, while state may be a torch.Tensor rather than a list of two tensors for models that do not do a state update (downscaling and unconditional models).
  4. Update README.md with more information about configuring the recipe to train different types of models.
  5. Update requirements.txt and README.md to note that torch>=2.10 is needed for domain parallelism.
  6. Various other minor improvements to README.md.
  7. Fix ConcatConditionWrapper where an or in an error check should have been and.
  8. Remove StableAdamW support as it doesn't work properly with FSDP even when domain parallelism is not used.

Description

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@jleinonen jleinonen requested a review from pzharrington March 5, 2026 19:03
@jleinonen jleinonen self-assigned this Mar 5, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR extends the StormCast recipe to support four distinct model types (hybrid StormCast-like, nowcasting, downscaling, and unconditional diffusion) by eliminating the need for dummy tensors in __getitem__, fixing a logical error in ConcatConditionWrapper, and removing the broken StableAdamW optimizer. The changes are well-structured and the new test_model_types test provides good coverage.

Key changes:

  • utils/nn.py / utils/trainer.py / utils/sampler.py: background and state[0] may now be None; corresponding None-guards were added throughout the forward and sampling paths.
  • physicsnemo/diffusion/utils/model_wrappers.py: Corrected orand in the ConcatConditionWrapper error guard so that the error fires only when both condition keys are absent, not when either one is missing.
  • utils/loss.py: Adds sigma.flatten() to the unconditional branch of EDMLoss, making it consistent with the conditional branch.
  • utils/optimizers.py / utils/config.py: Removes StableAdamW support as it doesn't work with FSDP/DTensor.
  • requirements.txt: Replaces torch-optimi[optimi] with torch>=2.10 to reflect the domain-parallelism minimum requirement.
  • README.md: Comprehensive documentation update with a new "Model types" table and clearer dataset interface specification.
  • Potential issue in sampler.py: When mean_hr is explicitly non-None and img_lr is None (e.g. an unconditional model with a regression output passed via sampler_args), img_lr.shape[-2:] (line 270) and x_lr.shape[0] (line 275) would both raise AttributeError before the x_lr is not None guard can take effect.

Important Files Changed

Filename Overview
physicsnemo/diffusion/utils/model_wrappers.py Correct bug fix: orand in the error guard of ConcatConditionWrapper; the error should fire only when both cond_concat and cond_vec are None.
examples/weather/stormcast/utils/sampler.py Added `img_lr: torch.Tensor
examples/weather/stormcast/utils/nn.py Correctly handles empty condition list (returns None), scalar-only TensorDict construction, batch["state"] as a single tensor, and missing "background" key. diffusion_model_forward now accepts explicit dtype/device to avoid dereferencing a potentially-None condition.
examples/weather/stormcast/utils/trainer.py Adds early validation that scalar conditions are only used with the DiT architecture, passes explicit dtype/device to diffusion_model_forward, and removes StableAdamW references.
examples/weather/stormcast/datasets/mock.py Adds model_type parameter to _MockDataset so tests can simulate all four model types. Missing else branch in __getitem__ could cause an UnboundLocalError if an unexpected value is passed at runtime.
examples/weather/stormcast/test_training.py Adds test_model_types parametrized over all four model types and two architectures, removes StableAdamW parametrization, and correctly uses nullcontext/pytest.raises for the scalar-conditions-on-UNet scenario.

Last reviewed commit: 1ff4144

@jleinonen
Copy link
Collaborator Author

/blossom-ci

Copy link
Collaborator

@pzharrington pzharrington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome changes!

@jleinonen
Copy link
Collaborator Author

/blossom-ci

@pzharrington
Copy link
Collaborator

/blossom-ci

@pzharrington pzharrington enabled auto-merge March 10, 2026 15:12
@pzharrington pzharrington added this pull request to the merge queue Mar 10, 2026
Merged via the queue into NVIDIA:main with commit 7ff39a7 Mar 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants