Skip to content

Update Pix2Pix for Model Standards#1369

Open
dallasfoster wants to merge 629 commits intoNVIDIA:0.5.0-rcfrom
dallasfoster:dallasf/update_pix2pix
Open

Update Pix2Pix for Model Standards#1369
dallasfoster wants to merge 629 commits intoNVIDIA:0.5.0-rcfrom
dallasfoster:dallasf/update_pix2pix

Conversation

@dallasfoster
Copy link
Collaborator

PhysicsNeMo Pull Request

Description

This PR makes a handful of documentation and argument typing changes in order to better fit model implementation coding standards. We make the conscience choice not to move the resnet block or unet skip connection block because they are mostly model specific and it would require too many changes to upstream those blocks.

Checklist

Dependencies

No new dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

tge25 and others added 30 commits July 30, 2025 13:27
* add patching support for determinstic sampler

* code cleanup and unit test update

* use patching wraper and fix pytest functions

* change utils.generative to utils.diffusion

* set default to torch.float64

* do compilation in determinstic sampler

* update

* Identified and fixed critical bug in stochastic_sampler and deterministic_sampler

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Format CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Implements wrapper selector to fix compile issues in tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: root <root@cw-dfw-h100-004-251-012.cm.cluster>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: root <root@cw-dfw-h100-004-211-033.cm.cluster>
Co-authored-by: root <root@cw-dfw-h100-004-270-026.cm.cluster>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
* resolving merge conflicts with main

* fixing bugs

* fixing CI errors

* fixing merge conflicts in config

* modifying Changelog

* Update config.yaml

* cpu processing in area_weighted_sampling

* fixing naming issue in domino_datapipe.py

* Update physicsnemo/models/domino/model.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update physicsnemo/models/domino/model.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update physicsnemo/models/domino/model.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update physicsnemo/models/domino/model.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update physicsnemo/models/domino/model.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update examples/cfd/external_aerodynamics/domino/src/conf/config.yaml

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update physicsnemo/models/domino/model.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Update examples/cfd/external_aerodynamics/domino/src/train.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* fixing PR comments

* addressing PR comments

* fixing CI issues

* fixing pytest issues in utils

---------

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>
* Add generic neighbor finding function that is suitable to use in FigConvNet, DoMINO, and mesh graph data pipes.

* Fix an illegal device access when  using multiple GPUs.

* Performance tuning of neighbor query

* Add warp-enabled radius search.

Also add testing.

* Update neighbor search tools to ensure we use 0 as the null index instead of -1

* Switch domino to use the new radius search function instead of ball query.

This is functionally the same, though shows a performance enhancement.

* Remove neighborlist function.  Replaced with radius_search.

* Using typing for annotations for CI

* Update examples/minimal/neighbor_list/warp_neighbor_list.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Address nits and minor comments from PR review.

* Relocate radius search code.

* Remove old folders; goes with previous commit.

* Update test import.

* The CI container does not accept list[int] as an acceptable type
for pytorch.

* Make sure radius search is exported as a function, not a module.

* Fixing formatting, since the linter appears to have changed ....

* Remove cuda opcheck test temporarily

---------

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>
* fixing bug in domino model

* fixing bug in domino model
* Update header_check.py

Fix license header check: when files are deleted while other files are modified, it fails.  This should make sure that the license check only runs on files not marked `D` for deleted - those get filtered out of the committed files list now.

* Update header_check.py

Fix ruff qa is.

* Update header_check.py

add os import

* pre-commit is so picky ...
…erNorm. (NVIDIA#1036)

* adding layer norm utils, skipping precommit since ORD is so unstable

* Add doc strings and type annotations to layer_norm implementation

* merge

* Snapshot layernorm optimizations.

* Enable dynamic selection of layer norm in MeshGraphNet.  Yields a good training
speed up on GPU.

* Remove old code.

* Remove unneeded file.

* Update test to avoid te on CPU

* Update formatting

* Update meshgraphnet.py

Update docstring to use torch layernorm (for CPU tests).

* Update meshgraphkan.py

Disable TE for docstring tests.

* Update meshgraphnet.py

* Fix ruff formatting

* Formatting ....

* Address PR feedback:
- remove warnings about deprecation
- add warning if env variable PHYSICSNEMO_FORCE_TE is set, but to
  an unexpected value.

* Update tests: env modification coming through a fixture now.

* Address graphcast too: use a fixture instead of contexts.

* Fix layer norm tests too.

* Fix a test
* Add PyG version of XAeroNet. Add pre-processor, dataloader, trainer and unit tests.

* Add halo region support in graph partitioning

* Remove graphs pre-loading.

* Update partition halo test

* Update CHANGELOG

* Linter

* Add torch_geometric to base Docker. Replace tabs with spaces.

* Remove torch_geometric from Docker, already in the base image

* Add torch_scatter to reqs.txt

---------

Co-authored-by: Corey adams <coreyjadams@gmail.com>
…tor (NVIDIA#1028)

* Add instructions to download and process DrivAerML data for DoMINO using PhysicsNeMo-Curator

* Remove downloading and processing scripts

* Update readme about caching
* add files from first successful run

* stable configs and parameters

* make the physics addition configurable

* merge the model changes

* remove unused files

* remove unused code

* restore configs to default paths

* readme additions

* fix conflicts

* update config

* add total residuals instead of L2

* update changelog

* linting

* address review feedback

* address review feedback

* add docs

* update device handling

* some minor updates

* update api

---------
* fixing bug in unet and reflecting changes in domino

* updating changelog

* modifying test

* addressing PR comments
* add hybrid meshgraphnet

* update utils and modules

* update example

* update changelog, bug fixes

* formatting

* unit tests, bug fixes, addressed review comments, formatting

* fix doctest

---------

Co-authored-by: root <root@eos0263.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0528.eos.clusters.nvidia.com>
…#1042)

* Add refactor of transolver model + darcy cfd example refresh.

* Resolve PR comments from P.S.

* Ensure darcy transolver example still runs.

* Update tests for minor api change
Adds transolver info to the changelog.
* Fix numerous doc issues.  Completely refactor the makefile for the docs
to make it parseable and understandable (no regex ...)

* Move images to make sure they're findable in the docs; update conf.py to not require math numbering.
* upload DPOT and an exmaple training file on NS2d

* resolve cfg issues, requirements, and comment issues

* resolve comments issues

* add explanation in config

* bug fixes, cleanup, formatting

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: root <root@eos0528.eos.clusters.nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
* Adds fixes

* Fixes extra line
* Improve lead time support for diffusion models

* Update changelog

* Add back mistakenly removed docstrings and type hints

* Revert a couple more unintended changes

* Fix type hint of lead time label

* Fix deterministic samples to allow CorrDiff tests to pass

* Rename utils.generative to utils.diffusion

* Add back __init__.py in generative

* Revert unnecessary changes

* Revert unnecessary changes

* Revert unnecessary changes

* Minor docstring improvement in SongUNetPosEmdb

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Add value checks and docstrings

* Update docstrings, add error condition

* Update docstrings

* Fix lead time tests

* Update docstring

* Change super().__init__ to use keyword args

* Minor formatting in deterministic_sampler docstring

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Minor renaming and formatting in loss.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed dtype casting of pos_emb in SongUNetPosEmbd

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed duplicate code in SongUNetPosEmbd.positional_embedding_indexing

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactor positional_embedding_indexing to eliminate dead and duplicate code

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactor positional_embedding_selector to enable batched lead-time labels

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Moved new test from song_unet_pos_embd to song_unet_pos_lt_embd

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updated CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added safety check to force users to use SongUNetPosLtEmdb for lead-time models

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Deleted unecessary test

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed bug in positional_embedding_selector + changed samplers and tests accordingly

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
* Add PyG version of Lagrangian MGN example, rename existing DGL example

* Add L-MGN PyG dataset. Add equivalency test.

* Fix L-MGN inference scripts
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
* Refactor GroupNorm and log unmatched state_dict keys

Signed-off-by: Julius Berner <jberner@nvidia.com>

* Refactor GroupNorm and log unmatched state_dict keys

Signed-off-by: Julius Berner <jberner@nvidia.com>

* Add changes from MR996

* Made load_state_dict method semi-private

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Move the attention migration into a load_state_dict pre-hook

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Deleted duplicate line in CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed warnings in UNetBlock load_state_dict pre-hook

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added test for UNetBlock checkpoint loading from v1.0.1

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Changed tol in test + added new test with fused_conv_bias=True

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updated CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updated CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Changed a GroupNorm into get_group_norm

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Improved docstring for get_group_norm

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed unused test

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Initial commit of group_norm tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added non-regression test for GroupNorm

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fix BC compatibility of GroupNorm

* Fixed some formatting in group norm + replaced deprecation warning with exception

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* New tests for GRoupNorm and get_group_norm

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed some tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Improvements in UNetBlock docstring

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Improvements in layers.py docstrings

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added non-regression test for UNetBlock

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed load_state_dict from UNetBlock

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactored group_norm test to use pytest parameterize instead of loops

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fix bugs in Attention layer

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Some ongoing work on unet_block tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added non-regression checkpoints and data + non-regression test for UNetBlock

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added IDs for group_norm tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added new tests for UNet block

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added more param validation in Attention

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added tests for new Attention layer

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Pin C++ backend for attention op

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added reference input data for attention tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Some files renaming

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Reverted back attention to previous implementation

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updates on new tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updates on new tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Deleted tests ref data

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Group norm test working

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Group norm test working

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Tests for attention layer passing locally

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed backend in attention test

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Modified UNetBlock tests

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Tests for UNetBlock passing locally

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Julius Berner <jberner@nvidia.com>
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Julius Berner <jberner@nvidia.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
… `/examples/` directory (NVIDIA#1063)

* Fixes Ruff pre-commit hooks to behave more similarly to previous setup with regards to the examples/ directory. Previously, this dir was formatted (by `black`) but not linted; this restores that behavior. However, now `ruff format` is used as the formatter.

* Applies formatting changes

* Update explanations
* Migrate Hybrid MGN example and model to PyG

* Linting

* Update inference script

* Add H-MGN equivalency test

* Update reqs.txt, fix formatting

* Update CHANGELOG. Add coalesce return type handling.
* Added DiT to models. Modified mlp_layrs.

* moved DiT to experimental

* Updated mlp_layers and removed dit from models

* updated ConditionEmbedder to support vector

renamed classes. 'str' control for attention. Updated docstring. Changed 'class' to 'condition'. Changed 'learnable' to 'positional'. Removed 'learn_sigma'

* Updated DocStrings. Updated CHANGELOG.md

* Added CI test. Reverted changes to mlp_layers.py

* Added modified DiT implementation

* Updated Changelog

* Fixed the import error messages for transformer_engine and apex

* Fixed docstring. Removed unused imports

* Updated docstrings and changelog

Removed defaults for input_size and in_channels

divided into layers.py and dit.py

Added other validations

* Updated docstring

---------

Co-authored-by: root <root@eos0526.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0571.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0475.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0117.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0568.eos.clusters.nvidia.com>
* Add cuml knn utility

* Add knn utilities with optimized backend selection.  Primarily targeting
data pipes and preprocessing, this kNN is not differentiable on
the distances returned.  However, you could use the indexes returned
and select on the `points` tensor, and *that* is differentiable.

The knn op has several backends:
- torch (brute force)
- scipy.spatial.KDTree (cpu only)
- cuml.Neighbords (GPU only)

The default backend is "auto" which will dispatch correctly.

* Update changelog

* Fix docstring and warning messages

* Resolve PR feedback.

Adds quiet promotion of bf16 data type to fp32 for cuml and scipy,
since that's not natively supported.

* Fix type annotations

* Darn linter got me

* Reduce cuml version requirement

* Disable cpu compile test

* I disabled the wrong CPU test ...
* Use torchinfo instead of manual counting

* Add training example for transolver on drivaerml surface data.

* Minor fixes, make fp32 default

* Add inference scripts for transolver example.

* Update changelog  for transolver example
* add capability to install torch scatter from custom wheel

* typos
CharlelieLrt and others added 7 commits February 13, 2026 21:59
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
* Use original `__init__` signature instead of reconstructing it

Signed-off-by: giprayogo <genki016@gmail.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
* Move xarry and zarr to optional deps

* Update pyproject.toml to move cuda perf options to a group, data libraries out of core deps.

* Update group info for h5py

* Update tests to protect against missing packages, or if a version is too low.  Add tensordict to the datapipe-extras group since it has support for several datapipes

* Re-purge fcn mip plugin

* Skip domino datapipe tests without zarr

* Fix another zarr import error

* Hopefully fix data pipe tests.

* Skip one distributed test that is acting up
…#1418)

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
* Adds cu12 and cu13 extras.

* docs

* docs
@dallasfoster
Copy link
Collaborator Author

/blossom-ci

loliverhennigh and others added 2 commits February 17, 2026 19:44
* fixed grid effect

* model cleanup

* model cleanup

---------

Co-authored-by: Oliver Hennigh <ohennigh@login-eos01.eos.clusters.nvidia.com>
* update license headers- second try

* fix inference bug

* formatting

* Update d3plot_reader.py
Copy link
Collaborator

@coreyjadams coreyjadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this is heading for deprecation, I'll approve as long as we're not breaking CI.

younes-abid and others added 13 commits February 17, 2026 20:19
* Fix missing num_steps parameter for stochastic sampler

- Add missing num_steps=cfg.sampler.num_steps parameter to stochastic_sampler partial() call
- This bug caused stochastic sampler to always use default 18 steps instead of configured num_steps
- Fixes inconsistency with deterministic sampler which correctly passes num_steps parameter
- Improves performance when using fewer diffusion steps (e.g., num_steps=2)

* Format code with pre-commit hooks

- Applied ruff formatting to generate.py
- Ensures code meets project style standards

---------

Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
* This fixes the view and unsqueeze operations for shard tensors.

We now explicitly apply both of those operations, to make sure the
operations work at both the dispatch and functional level.

* Address inconsistent handling of view and reshape.

* Remove fall back path, that was not smart.

The _real_ answer was to actually import and use the wrappers.

* Add more tests and coverage for view ops.

* Undo normalization changes

* Update for review comments and fix tests
* Fix kernel size handling in partial_na2d

* docstring
* Fix the domino config path bug.  Using OmegaConf directly
to resolve if something is a config, rather than duck typing.

* Fixing the automatic detection of cuml for domino, and an import path.

* This should fix the last issues with domino.

* Fix model snapshotting.

---------

Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Refactor (#1208)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Add FIGConvNet to crash example (#1207)

* Add FIGConvNet to crash example.

* Add FIGConvNet to crash example

* Update model config

* propose fix some typos (#1209)

Signed-off-by: John E <jeis4wpi@outlook.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

---------

Signed-off-by: John E <jeis4wpi@outlook.com>
Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com>
Co-authored-by: John Eismeier <42679190+jeis4wpi@users.noreply.github.com>

* Unmigrate the insolation utils (#1211)

* unmigrate the insolation utils

* Revert test and compat map

* Update importlinter

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Refactor (#1216)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Update activations path in dlwp tests (#1217)

* Update activations path in dlwp tests

* Update example paths

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Refactor (#1224)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* Refactor (#1231)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>
Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com>
Co-authored-by: Deepak Akhare <dakhare@nvidia.com>
Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* update import paths

* Starting to clean up dependency tree.

* Refactor (#1233)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Fixed minor bug in shape validation in SongUNet (#1230)

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Add Zarr reader for Crash (#1228)

* Add Zarr reader for Crash

* Update README

* Update validation logic of point data in Zarr reader

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add a test for 2D feature arrays

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>
Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com>
Co-authored-by: Deepak Akhare <dakhare@nvidia.com>
Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>

* Added coding standards for model implementations as a custom context for greptile (#1219)

* Added initial set of coding standards for model implementations

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed typos + review comments + added details

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added more rules for models

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added model rules to PR checklist

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added cusror rules for models

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Linked the wiki page to the PR template

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed typo in PR checklist

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Refactor (#1234)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Fixed minor bug in shape validation in SongUNet (#1230)

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Add Zarr reader for Crash (#1228)

* Add Zarr reader for Crash

* Update README

* Update validation logic of point data in Zarr reader

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add a test for 2D feature arrays

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Add AR RT and OT schemes to Crash FIGConvNet (#1232)

* Add AR and OT schemes for FIGConvNet

* Add tests

* Soothe the linter

* Fix the tests

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>
Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com>
Co-authored-by: Deepak Akhare <dakhare@nvidia.com>
Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com>

* Not seeing any errors in testing ...

* Breakdown of rules into smaller rules (#1236)

* Breakdown of rules into smaller rules

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fix mismatches in rule IDs referenced in rule text

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactor (#1240)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* Formatting active learning module docstrings (#1238)

* docs: fixing Protocol class reference formatting

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>

* docs: removing mermaid diagram from protocols

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>

* docs: adding active learning index

* docs: revising docstrings for sphinx formatting

* docs: fix placeholder URL for active learning main docs

---------

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>

---------

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Kelvin Lee <kin.long.kelvin.lee@gmail.com>

* Refactor (#1247)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* A new X-MeshGraphNet example for reservoir simulation. (#1186)

* X-MGN for reservoir simulation

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* installation bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* more well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve path_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix while space in config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix version inconsistency in  requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add versions for some libs in requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetiem in mlflow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* formatting

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor loop

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* grad accum bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* total loss bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* added some safe guard for connection indexing

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

* cleanup

* update configs

* Update README.md

style guide rule changes

* Update README.md

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve docstring fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update license yr

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup well

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cimprove infrence fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetime

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve train.py fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve requirement

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* ilcense header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* license header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder (parallel) + added results to readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* delete some unsed files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* address PR comments

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference grdecl header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* support time series

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor update

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace pickle with json

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add license headers

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unused png files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unsed import

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove emojis

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace print with logger

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update docstring

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor updates

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Add knn to autodoc table. (#1244)

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: tonishi-nv <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Refactor (#1249)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* A new X-MeshGraphNet example for reservoir simulation. (#1186)

* X-MGN for reservoir simulation

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* installation bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* more well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve path_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix while space in config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix version inconsistency in  requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add versions for some libs in requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetiem in mlflow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* formatting

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor loop

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* grad accum bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* total loss bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* added some safe guard for connection indexing

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

* cleanup

* update configs

* Update README.md

style guide rule changes

* Update README.md

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve docstring fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update license yr

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup well

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cimprove infrence fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetime

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve train.py fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve requirement

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* ilcense header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* license header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder (parallel) + added results to readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* delete some unsed files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* address PR comments

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference grdecl header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* support time series

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor update

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace pickle with json

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add license headers

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unused png files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unsed import

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove emojis

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace print with logger

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update docstring

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor updates

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Add knn to autodoc table. (#1244)

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: tonishi-nv <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Automated model registry (#1252)

* Deleted RegistreableModule

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed 'PhysicsNeMo' suffix in Module.from_torch method

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Implemented automatic registration for Module subclasses

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed unused name

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Metadata name deprecation (#1257)

* Initiated deprecation of field 'name' in ModelMetaData

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed all occurences of 'name' field in ModelMetaData

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactor (#1258)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* A new X-MeshGraphNet example for reservoir simulation. (#1186)

* X-MGN for reservoir simulation

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* installation bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* more well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve path_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix while space in config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix version inconsistency in  requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add versions for some libs in requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetiem in mlflow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* formatting

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor loop

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* grad accum bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* total loss bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* added some safe guard for connection indexing

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

* cleanup

* update configs

* Update README.md

style guide rule changes

* Update README.md

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve docstring fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update license yr

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup well

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cimprove infrence fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetime

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve train.py fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve requirement

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* ilcense header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* license header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder (parallel) + added results to readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* delete some unsed files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* address PR comments

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference grdecl header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* support time series

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor update

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace pickle with json

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add license headers

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unused png files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unsed import

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove emojis

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace print with logger

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update docstring

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor updates

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Add knn to autodoc table. (#1244)

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Update version (#1193)

* Fix depenedncies to enable hello world (#1195)

* Remove zero-len arrays from test dataset (#1198)

* Merge updates to Gray Scott example (#1239)

* Remove pyevtk

* update dependency

* update dimensions

* ci issues

* Interpolation model example (#1149)

* Temporal interpolation training recipe

* Add README

* Docs changes based on comments

* Update docstrings and README

* Add temporal interpolation animation

* Add animation link

* Add shape check in loss

* Updates of configs + trainer

* Update config comments

* Update README.md

style guide edits

* Added wandb logging

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Reformated sections in docstring for GeometricL2Loss

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Update README and configs

* README changes + type hint fixes

* Update README.md

* Draft of validation script

* Update validation and README

* Fixed command in README.md for temporal_interpolation example

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed unused import in datapipe/climate_interp.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updated license headers in temporal_interpolation example

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Renamed methods to avoid implicit shadowing in Trainer class

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Cosmetic changes in train.py and removed unused import in validate.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added clamp in validate.py to make sure step does not go out of bounds

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added the temporal_interpolation example to the docs + updated CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Addressing remaining comments

* Merged two data source classes in climate_interp.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>

* update versions

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: tonishi-nv <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Jussi Leinonen <jleinonen@nvidia.com>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com>

* Remove IPDB

* Few more dep fixes.

* Refactor (#1261)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Remove IPDB

* Few more dep fixes.

* Enhance checkpoint configuration for DLWP Healpix and GraphCast (#1253)

* feat(weather): Improve configuration for DLWP Healpix and GraphCast examples

- Added configurable checkpoint directory to DLWP Healpix config and training script.
- Implemented Trainer logic to use specific checkpoint directory.
- Updated utils.py to respect exact checkpoint path.
- Made Weights & Biases entity and project configurable in GraphCast example.

* fix(dlwp_healpix): remove deprecated configs

- Removed the deprecated `verbose` parameter from the `CosineAnnealingLR` configuration in DLWP HEALPix, which was causing a TypeError.
- Removed unused configs from examples/weather/dlwp_healpix/

* Transolver volume (#1242)

* Implement transolver ++ physics attention

* Enable ++ in Transolver.

* Fix temperature correction terms.

* Starting work adapting the domino datapipe techniques to transolver.

* Working towards transolver volume training by mergeing with domino dataset.

Surface dataloading is prototyped, not finished yet.

* Updating

* Remove printout

* Enable transolver for volumetric data

* Update transolver training script to support either surface or volume data.

Applied some cleanup to make the datapipe similar to domino, which
is a step towards unification.

* Updating datapipe

* Tweak transolver volume configs

* Add transolverX model

* Enable nearly-uniform sampling of very very large arrays

* limit benchmarking to train epoch, enable profiler in config

* Update volume config slightly

* Update training scripts to properly enable data preloading

* Working towards adding a muon optimzier in transolver

* Add peter's implementation of muon with a combined optimizer.  switch to a flat LR.

* Add updated inference script that can also calculate drag and lift

* Add better docstrings for typhon

* Move typhon to experimental

* Move forwards docstring

* Adding typhon model and configs.

* Update readme.

* Update

* Remove extra model.  Update recipes.

* Update cae_dataset.py

Implement abstract methods in base classes.

* Update Physics_Attention.py

Ensure plus parameter is passed to base class.

* Update test_mesh_datapipe.py

Update import path for mesh datapipe.

* Fix ruff issues

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Dileep Ranganathan <8152399+dran-dev@users.noreply.github.com>

* Add external import coding standards.

* Update external import standards.

* Ensure vtk functions are protected.

* Protect pyvista import

* Closing more import gaps

* Remove DGL from meshgraphkan

* All models now comply with external import linting.

* Remove DGL datapipes

* cae datapipes in compliance

* Update pyproject.toml

* Add version numbers to deps

* Refactor (#1261)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect externa…
* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Refactor (#1208)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Add FIGConvNet to crash example (#1207)

* Add FIGConvNet to crash example.

* Add FIGConvNet to crash example

* Update model config

* propose fix some typos (#1209)

Signed-off-by: John E <jeis4wpi@outlook.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

---------

Signed-off-by: John E <jeis4wpi@outlook.com>
Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com>
Co-authored-by: John Eismeier <42679190+jeis4wpi@users.noreply.github.com>

* Unmigrate the insolation utils (#1211)

* unmigrate the insolation utils

* Revert test and compat map

* Update importlinter

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Refactor (#1216)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Update activations path in dlwp tests (#1217)

* Update activations path in dlwp tests

* Update example paths

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Refactor (#1224)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* Refactor (#1231)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>
Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com>
Co-authored-by: Deepak Akhare <dakhare@nvidia.com>
Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* update import paths

* Starting to clean up dependency tree.

* Refactor (#1233)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Fixed minor bug in shape validation in SongUNet (#1230)

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Add Zarr reader for Crash (#1228)

* Add Zarr reader for Crash

* Update README

* Update validation logic of point data in Zarr reader

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add a test for 2D feature arrays

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>
Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com>
Co-authored-by: Deepak Akhare <dakhare@nvidia.com>
Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>

* Added coding standards for model implementations as a custom context for greptile (#1219)

* Added initial set of coding standards for model implementations

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed typos + review comments + added details

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added more rules for models

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added model rules to PR checklist

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added cusror rules for models

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Linked the wiki page to the PR template

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed typo in PR checklist

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Refactor (#1234)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Update crash readme (#1212)

* update license headers- second try

* update readme

* Bump multi-storage-client to v0.33.0 with rust client (#1156)

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Add jaxtyping to requirements.txt for crash sample (#1218)

* update license headers- second try

* Update requirements.txt

* Updating to address some test issues

* Replace 'License' link with 'Dev blog' link (#1215)

Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Validation fu added to examples/structural_mechanics/crash/train.py (#1204)

* validation added: works for multi-node job.

* rename and rearrange validation function

* validate_every_n_epochs, save_ckpt_every_n_epochs added in config

* corrected bug (args of model) in inference

* args in validation code updated

* val path added and args name changed

* validation split added -> write_vtp=False

* fixed inference bug

* bug fix: write_vtp

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Add saikrishnanc-nv to github actors (#1225)

* Integrate Curator instructions to the Crash example (#1213)

* Integrate Curator instructions

* Update docs

* Formatting changes

* Adding code of conduct (#1214)

* Adding code of conduct

Adopting the code of conduct from the https://www.contributor-covenant.org/

* Update CODE_OF_CONDUCT.MD

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Create .markdownlintignore

* Revise README for PhysicsNeMo resources and guidance

Updated the 'Getting Started' section and added new resources for learning AI Physics.

* Update README.md

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com>

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Fixed minor bug in shape validation in SongUNet (#1230)

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Add Zarr reader for Crash (#1228)

* Add Zarr reader for Crash

* Update README

* Update validation logic of point data in Zarr reader

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add a test for 2D feature arrays

* Update examples/structural_mechanics/crash/zarr_reader.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Add AR RT and OT schemes to Crash FIGConvNet (#1232)

* Add AR and OT schemes for FIGConvNet

* Add tests

* Soothe the linter

* Fix the tests

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Yongming Ding <yongmingd@nvidia.com>
Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com>
Co-authored-by: Deepak Akhare <dakhare@nvidia.com>
Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com>

* Not seeing any errors in testing ...

* Breakdown of rules into smaller rules (#1236)

* Breakdown of rules into smaller rules

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fix mismatches in rule IDs referenced in rule text

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactor (#1240)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* Formatting active learning module docstrings (#1238)

* docs: fixing Protocol class reference formatting

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>

* docs: removing mermaid diagram from protocols

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>

* docs: adding active learning index

* docs: revising docstrings for sphinx formatting

* docs: fix placeholder URL for active learning main docs

---------

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>

---------

Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Kelvin Lee <kin.long.kelvin.lee@gmail.com>

* Refactor (#1247)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* A new X-MeshGraphNet example for reservoir simulation. (#1186)

* X-MGN for reservoir simulation

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* installation bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* more well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve path_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix while space in config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix version inconsistency in  requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add versions for some libs in requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetiem in mlflow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* formatting

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor loop

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* grad accum bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* total loss bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* added some safe guard for connection indexing

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

* cleanup

* update configs

* Update README.md

style guide rule changes

* Update README.md

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve docstring fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update license yr

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup well

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cimprove infrence fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetime

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve train.py fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve requirement

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* ilcense header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* license header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder (parallel) + added results to readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* delete some unsed files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* address PR comments

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference grdecl header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* support time series

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor update

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace pickle with json

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add license headers

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unused png files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unsed import

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove emojis

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace print with logger

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update docstring

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor updates

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Add knn to autodoc table. (#1244)

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: tonishi-nv <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Refactor (#1249)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* A new X-MeshGraphNet example for reservoir simulation. (#1186)

* X-MGN for reservoir simulation

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* installation bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* more well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve path_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix while space in config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix version inconsistency in  requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add versions for some libs in requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetiem in mlflow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* formatting

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor loop

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* grad accum bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* total loss bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* added some safe guard for connection indexing

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

* cleanup

* update configs

* Update README.md

style guide rule changes

* Update README.md

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve docstring fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update license yr

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup well

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cimprove infrence fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetime

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve train.py fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve requirement

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* ilcense header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* license header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder (parallel) + added results to readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* delete some unsed files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* address PR comments

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference grdecl header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* support time series

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor update

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace pickle with json

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add license headers

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unused png files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unsed import

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove emojis

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace print with logger

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update docstring

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor updates

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Add knn to autodoc table. (#1244)

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: tonishi-nv <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Automated model registry (#1252)

* Deleted RegistreableModule

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed 'PhysicsNeMo' suffix in Module.from_torch method

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Implemented automatic registration for Module subclasses

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Fixed unused name

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Metadata name deprecation (#1257)

* Initiated deprecation of field 'name' in ModelMetaData

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed all occurences of 'name' field in ModelMetaData

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Refactor (#1258)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* A new X-MeshGraphNet example for reservoir simulation. (#1186)

* X-MGN for reservoir simulation

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* installation bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* more well object docstring fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve path_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix while space in config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fix version inconsistency in  requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add versions for some libs in requirement.txt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve mldlow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetiem in mlflow_utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve exception handling in inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* formatting

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve preprocessor loop

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* grad accum bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* total loss bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* added some safe guard for connection indexing

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* bug fix

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup utils

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

* cleanup

* update configs

* Update README.md

style guide rule changes

* Update README.md

* fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve docstring fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update license yr

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup well

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup preproc fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cimprove infrence fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve datetime

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve train.py fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme fmt

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve requirement

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* ilcense header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve ecl reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* cleanup

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* license header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder (parallel) + added results to readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* delete some unsed files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* address PR comments

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve inference grdecl header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* support time series

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update config

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor update

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* improve graph builder

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update ecl_reader logging

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace pickle with json

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* add license headers

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unused png files

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove unsed import

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* remove emojis

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* replace print with logger

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update docstring

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* minor updates

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update readme

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

* update header

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>

* Add knn to autodoc table. (#1244)

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Update version (#1193)

* Fix depenedncies to enable hello world (#1195)

* Remove zero-len arrays from test dataset (#1198)

* Merge updates to Gray Scott example (#1239)

* Remove pyevtk

* update dependency

* update dimensions

* ci issues

* Interpolation model example (#1149)

* Temporal interpolation training recipe

* Add README

* Docs changes based on comments

* Update docstrings and README

* Add temporal interpolation animation

* Add animation link

* Add shape check in loss

* Updates of configs + trainer

* Update config comments

* Update README.md

style guide edits

* Added wandb logging

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Reformated sections in docstring for GeometricL2Loss

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Update README and configs

* README changes + type hint fixes

* Update README.md

* Draft of validation script

* Update validation and README

* Fixed command in README.md for temporal_interpolation example

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Removed unused import in datapipe/climate_interp.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Updated license headers in temporal_interpolation example

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Renamed methods to avoid implicit shadowing in Trainer class

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Cosmetic changes in train.py and removed unused import in validate.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added clamp in validate.py to make sure step does not go out of bounds

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Added the temporal_interpolation example to the docs + updated CHANGELOG.md

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Addressing remaining comments

* Merged two data source classes in climate_interp.py

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

---------

Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>

* update versions

---------

Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com>
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: tonishi-nv <tonishi@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Jussi Leinonen <jleinonen@nvidia.com>
Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com>

* Remove IPDB

* Few more dep fixes.

* Refactor (#1261)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Remove IPDB

* Few more dep fixes.

* Enhance checkpoint configuration for DLWP Healpix and GraphCast (#1253)

* feat(weather): Improve configuration for DLWP Healpix and GraphCast examples

- Added configurable checkpoint directory to DLWP Healpix config and training script.
- Implemented Trainer logic to use specific checkpoint directory.
- Updated utils.py to respect exact checkpoint path.
- Made Weights & Biases entity and project configurable in GraphCast example.

* fix(dlwp_healpix): remove deprecated configs

- Removed the deprecated `verbose` parameter from the `CosineAnnealingLR` configuration in DLWP HEALPix, which was causing a TypeError.
- Removed unused configs from examples/weather/dlwp_healpix/

* Transolver volume (#1242)

* Implement transolver ++ physics attention

* Enable ++ in Transolver.

* Fix temperature correction terms.

* Starting work adapting the domino datapipe techniques to transolver.

* Working towards transolver volume training by mergeing with domino dataset.

Surface dataloading is prototyped, not finished yet.

* Updating

* Remove printout

* Enable transolver for volumetric data

* Update transolver training script to support either surface or volume data.

Applied some cleanup to make the datapipe similar to domino, which
is a step towards unification.

* Updating datapipe

* Tweak transolver volume configs

* Add transolverX model

* Enable nearly-uniform sampling of very very large arrays

* limit benchmarking to train epoch, enable profiler in config

* Update volume config slightly

* Update training scripts to properly enable data preloading

* Working towards adding a muon optimzier in transolver

* Add peter's implementation of muon with a combined optimizer.  switch to a flat LR.

* Add updated inference script that can also calculate drag and lift

* Add better docstrings for typhon

* Move typhon to experimental

* Move forwards docstring

* Adding typhon model and configs.

* Update readme.

* Update

* Remove extra model.  Update recipes.

* Update cae_dataset.py

Implement abstract methods in base classes.

* Update Physics_Attention.py

Ensure plus parameter is passed to base class.

* Update test_mesh_datapipe.py

Update import path for mesh datapipe.

* Fix ruff issues

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Dileep Ranganathan <8152399+dran-dev@users.noreply.github.com>

* Add external import coding standards.

* Update external import standards.

* Ensure vtk functions are protected.

* Protect pyvista import

* Closing more import gaps

* Remove DGL from meshgraphkan

* All models now comply with external import linting.

* Remove DGL datapipes

* cae datapipes in compliance

* Update pyproject.toml

* Add version numbers to deps

* Refactor (#1261)

* Move filesystems and version_check to core

* Fix version check tests

* Reorganize distributed, domain_parallel, and begin nn / utils cleanup.

* Move modules and meta to core.  Move registry to core.

No tests fixed yet.

* Add missing init files

* Update build system and specify some deps.

* Reorganize tests.

* Update init files

* Clean up neighbor tools.

* Update testing

* Fix compat tests

* Move core model tests to tests/core/

* Add import lint config

* Relocate layers

* Move graphcast utils into model directory

* Relocating util functionalities.

* Further clean up and organize tests.

* utils tests are passing now

* Cleaning up distributed tests

* Patching tests working again in nn

* Fix sdf test

* Fix zenith angle tests

* Some organization of tests.  Checkpoints is moved into utils.

* Remove launch.utils and launch.config.  Checkpointing is moved to
phsyicsnemo.utils, launch.config is just gone.  It was empty.

* Most nn tests are passing

* Further cleanup.  Getting there!

* Remove constants file

* Add import linting to pre-commit.

* Move gnn layers and start to fix several model tests.

* AFNO is now passing.

* Rnn models passing.

* Fix improt

* Healpix tests are working

* Domino and unet working

* Updating to address some test issues

* MGN tests passing again

* Most graphcast tests passing again

* Move nd conv layers.

* update fengwu and pangu

* Update sfno and pix2pix test

* update tests for figconvnet, swinrnn, superresnet

* updating more models to pass

* Update distributed tests, now passing.

* Domain parallel tests now passing.

* Fix active learning imports so tests pass in refactor

* Fix some metric imports

* Remove deploy package

* Remove unused test file

* unmigrate these files ... again?

* Update import linter.

* Cleaning up diffusion models. Not quite done yet.

* Restore deleted files

* Updating more tests.

* Further updates to tests.  Datapipes almost working.

* update import paths

* Starting to clean up dependency tree.

* Fixing and adjusting a broad suite of tests.

* Update test/domain_parallel/conftest.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Minor fix

* Not seeing any errors in testing ...

* Enable import linting on internal imports.

* Remove ensure_available function, it's confusing

* Add logging imports to utils, and fix imports in examples.

* Update imports in minimal examples

* Update structural mechanics examples

* Update import paths: reservoir_sim

* Update import paths: additive manufacturing

* Update import paths: topodiff

* Update import paths: weather part 1

* Update import paths: weather part 2

* Update import paths: molecular dynamics

* Update import paths: geophysics

* Update import paths: cfd + external_aero 1

* Update import paths: cfd + external_aero 2

* Remove more DGL examples

* Remove more DGL examples

* cfd examples 3

* Last batch of example import fixes!

* Enforce and protect external deps in utils.

* Remove DGL.  :party:

* Don't force models yet

* Remove IPDB

* Few more dep fixes.

* Enhance checkpoint configuration for DLWP Healpix and GraphCast (#1253)

* feat(weather): Improve configuration for DLWP Healpix and GraphCast examples

- Added configurable checkpoint directory to DLWP Healpix config and training script.
- Implemented Trainer logic to us…
* Refactor nn modules and functionals

* Sync conv layer typing

* Restore CODEOWNERS formatting

* Remove top-level nn module duplicates

* Reorder HEALPix padding helpers

* Fix attention layers and docs

* Fix sdf import path in datapipes

* fixed issue

* merging stuff

* current

* asv

* bit more

* make inputs

* warp interpolation

* merging

* merging

* almost done

* fixed imports

* removed warp check as its depen now

* fixed uit test

* updated license:

* updated license:

* fixed unit test

* blaa
* Fix sharded Group Norm.

Now works with uneven shardings, and has better numerical accuracy.
Also tested with mixed precision.

* Update for review comments

* Update physicsnemo/domain_parallel/shard_utils/normalization_patches.py

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* fix precommit

---------

Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>
…ially (NVIDIA#1426)

consolidates the argument parsing for all the operatiosn.
* Switches from ["_cache"] subkey to a proper _cache TensorDict, which properly isolates public and private methods.

* formatting

* removes unused import

* simplifies mesh repr; no need for exclude_cache

* reverts modernization, as this causes errors on older pyvista
* modernize docker builds

* lock cupy-cuda13x version

* remove uv cache to save on space for deployment image

* address review feedback

* Update Dockerfile

* Update torch-harmonics version and install options

* Fix NATTEN_CUDA_ARCH environment variable syntax

* some fixes to natten builds, torch harmonics and misc
@dallasfoster
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator

FYI we should target RC branch

@dallasfoster
Copy link
Collaborator Author

/blossom-ci

@dallasfoster dallasfoster changed the base branch from main to 0.5.0-rc February 24, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.