Update Pix2Pix for Model Standards#1369
Open
dallasfoster wants to merge 629 commits intoNVIDIA:0.5.0-rcfrom
Open
Update Pix2Pix for Model Standards#1369dallasfoster wants to merge 629 commits intoNVIDIA:0.5.0-rcfrom
dallasfoster wants to merge 629 commits intoNVIDIA:0.5.0-rcfrom
Conversation
* add patching support for determinstic sampler * code cleanup and unit test update * use patching wraper and fix pytest functions * change utils.generative to utils.diffusion * set default to torch.float64 * do compilation in determinstic sampler * update * Identified and fixed critical bug in stochastic_sampler and deterministic_sampler Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Format CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Implements wrapper selector to fix compile issues in tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: root <root@cw-dfw-h100-004-251-012.cm.cluster> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: root <root@cw-dfw-h100-004-211-033.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-270-026.cm.cluster> Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
* resolving merge conflicts with main * fixing bugs * fixing CI errors * fixing merge conflicts in config * modifying Changelog * Update config.yaml * cpu processing in area_weighted_sampling * fixing naming issue in domino_datapipe.py * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update examples/cfd/external_aerodynamics/domino/src/conf/config.yaml Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update examples/cfd/external_aerodynamics/domino/src/train.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * fixing PR comments * addressing PR comments * fixing CI issues * fixing pytest issues in utils --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>
* Add generic neighbor finding function that is suitable to use in FigConvNet, DoMINO, and mesh graph data pipes. * Fix an illegal device access when using multiple GPUs. * Performance tuning of neighbor query * Add warp-enabled radius search. Also add testing. * Update neighbor search tools to ensure we use 0 as the null index instead of -1 * Switch domino to use the new radius search function instead of ball query. This is functionally the same, though shows a performance enhancement. * Remove neighborlist function. Replaced with radius_search. * Using typing for annotations for CI * Update examples/minimal/neighbor_list/warp_neighbor_list.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Address nits and minor comments from PR review. * Relocate radius search code. * Remove old folders; goes with previous commit. * Update test import. * The CI container does not accept list[int] as an acceptable type for pytorch. * Make sure radius search is exported as a function, not a module. * Fixing formatting, since the linter appears to have changed .... * Remove cuda opcheck test temporarily --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>
* fixing bug in domino model * fixing bug in domino model
* Update header_check.py Fix license header check: when files are deleted while other files are modified, it fails. This should make sure that the license check only runs on files not marked `D` for deleted - those get filtered out of the committed files list now. * Update header_check.py Fix ruff qa is. * Update header_check.py add os import * pre-commit is so picky ...
…erNorm. (NVIDIA#1036) * adding layer norm utils, skipping precommit since ORD is so unstable * Add doc strings and type annotations to layer_norm implementation * merge * Snapshot layernorm optimizations. * Enable dynamic selection of layer norm in MeshGraphNet. Yields a good training speed up on GPU. * Remove old code. * Remove unneeded file. * Update test to avoid te on CPU * Update formatting * Update meshgraphnet.py Update docstring to use torch layernorm (for CPU tests). * Update meshgraphkan.py Disable TE for docstring tests. * Update meshgraphnet.py * Fix ruff formatting * Formatting .... * Address PR feedback: - remove warnings about deprecation - add warning if env variable PHYSICSNEMO_FORCE_TE is set, but to an unexpected value. * Update tests: env modification coming through a fixture now. * Address graphcast too: use a fixture instead of contexts. * Fix layer norm tests too. * Fix a test
* Add PyG version of XAeroNet. Add pre-processor, dataloader, trainer and unit tests. * Add halo region support in graph partitioning * Remove graphs pre-loading. * Update partition halo test * Update CHANGELOG * Linter * Add torch_geometric to base Docker. Replace tabs with spaces. * Remove torch_geometric from Docker, already in the base image * Add torch_scatter to reqs.txt --------- Co-authored-by: Corey adams <coreyjadams@gmail.com>
…tor (NVIDIA#1028) * Add instructions to download and process DrivAerML data for DoMINO using PhysicsNeMo-Curator * Remove downloading and processing scripts * Update readme about caching
* add files from first successful run * stable configs and parameters * make the physics addition configurable * merge the model changes * remove unused files * remove unused code * restore configs to default paths * readme additions * fix conflicts * update config * add total residuals instead of L2 * update changelog * linting * address review feedback * address review feedback * add docs * update device handling * some minor updates * update api ---------
* fixing bug in unet and reflecting changes in domino * updating changelog * modifying test * addressing PR comments
* add hybrid meshgraphnet * update utils and modules * update example * update changelog, bug fixes * formatting * unit tests, bug fixes, addressed review comments, formatting * fix doctest --------- Co-authored-by: root <root@eos0263.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0528.eos.clusters.nvidia.com>
…#1042) * Add refactor of transolver model + darcy cfd example refresh. * Resolve PR comments from P.S. * Ensure darcy transolver example still runs. * Update tests for minor api change
Adds transolver info to the changelog.
* Fix numerous doc issues. Completely refactor the makefile for the docs to make it parseable and understandable (no regex ...) * Move images to make sure they're findable in the docs; update conf.py to not require math numbering.
* upload DPOT and an exmaple training file on NS2d * resolve cfg issues, requirements, and comment issues * resolve comments issues * add explanation in config * bug fixes, cleanup, formatting --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: root <root@eos0528.eos.clusters.nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
* Adds fixes * Fixes extra line
* Improve lead time support for diffusion models * Update changelog * Add back mistakenly removed docstrings and type hints * Revert a couple more unintended changes * Fix type hint of lead time label * Fix deterministic samples to allow CorrDiff tests to pass * Rename utils.generative to utils.diffusion * Add back __init__.py in generative * Revert unnecessary changes * Revert unnecessary changes * Revert unnecessary changes * Minor docstring improvement in SongUNetPosEmdb Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Add value checks and docstrings * Update docstrings, add error condition * Update docstrings * Fix lead time tests * Update docstring * Change super().__init__ to use keyword args * Minor formatting in deterministic_sampler docstring Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Minor renaming and formatting in loss.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed dtype casting of pos_emb in SongUNetPosEmbd Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed duplicate code in SongUNetPosEmbd.positional_embedding_indexing Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactor positional_embedding_indexing to eliminate dead and duplicate code Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactor positional_embedding_selector to enable batched lead-time labels Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Moved new test from song_unet_pos_embd to song_unet_pos_lt_embd Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added safety check to force users to use SongUNetPosLtEmdb for lead-time models Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Deleted unecessary test Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed bug in positional_embedding_selector + changed samplers and tests accordingly Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
* Add PyG version of Lagrangian MGN example, rename existing DGL example * Add L-MGN PyG dataset. Add equivalency test. * Fix L-MGN inference scripts
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
* Refactor GroupNorm and log unmatched state_dict keys Signed-off-by: Julius Berner <jberner@nvidia.com> * Refactor GroupNorm and log unmatched state_dict keys Signed-off-by: Julius Berner <jberner@nvidia.com> * Add changes from MR996 * Made load_state_dict method semi-private Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Move the attention migration into a load_state_dict pre-hook Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Deleted duplicate line in CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed warnings in UNetBlock load_state_dict pre-hook Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added test for UNetBlock checkpoint loading from v1.0.1 Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Changed tol in test + added new test with fused_conv_bias=True Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Changed a GroupNorm into get_group_norm Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Improved docstring for get_group_norm Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed unused test Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Initial commit of group_norm tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added non-regression test for GroupNorm Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fix BC compatibility of GroupNorm * Fixed some formatting in group norm + replaced deprecation warning with exception Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * New tests for GRoupNorm and get_group_norm Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed some tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Improvements in UNetBlock docstring Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Improvements in layers.py docstrings Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added non-regression test for UNetBlock Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed load_state_dict from UNetBlock Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactored group_norm test to use pytest parameterize instead of loops Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fix bugs in Attention layer Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some ongoing work on unet_block tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added non-regression checkpoints and data + non-regression test for UNetBlock Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added IDs for group_norm tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added new tests for UNet block Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added more param validation in Attention Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added tests for new Attention layer Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Pin C++ backend for attention op Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added reference input data for attention tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some files renaming Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Reverted back attention to previous implementation Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updates on new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updates on new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Deleted tests ref data Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Group norm test working Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Group norm test working Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Tests for attention layer passing locally Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed backend in attention test Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Modified UNetBlock tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Tests for UNetBlock passing locally Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Julius Berner <jberner@nvidia.com> Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Julius Berner <jberner@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com>
… `/examples/` directory (NVIDIA#1063) * Fixes Ruff pre-commit hooks to behave more similarly to previous setup with regards to the examples/ directory. Previously, this dir was formatted (by `black`) but not linted; this restores that behavior. However, now `ruff format` is used as the formatter. * Applies formatting changes * Update explanations
* Migrate Hybrid MGN example and model to PyG * Linting * Update inference script * Add H-MGN equivalency test * Update reqs.txt, fix formatting * Update CHANGELOG. Add coalesce return type handling.
* Added DiT to models. Modified mlp_layrs. * moved DiT to experimental * Updated mlp_layers and removed dit from models * updated ConditionEmbedder to support vector renamed classes. 'str' control for attention. Updated docstring. Changed 'class' to 'condition'. Changed 'learnable' to 'positional'. Removed 'learn_sigma' * Updated DocStrings. Updated CHANGELOG.md * Added CI test. Reverted changes to mlp_layers.py * Added modified DiT implementation * Updated Changelog * Fixed the import error messages for transformer_engine and apex * Fixed docstring. Removed unused imports * Updated docstrings and changelog Removed defaults for input_size and in_channels divided into layers.py and dit.py Added other validations * Updated docstring --------- Co-authored-by: root <root@eos0526.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0571.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0475.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0117.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0568.eos.clusters.nvidia.com>
* Add cuml knn utility * Add knn utilities with optimized backend selection. Primarily targeting data pipes and preprocessing, this kNN is not differentiable on the distances returned. However, you could use the indexes returned and select on the `points` tensor, and *that* is differentiable. The knn op has several backends: - torch (brute force) - scipy.spatial.KDTree (cpu only) - cuml.Neighbords (GPU only) The default backend is "auto" which will dispatch correctly. * Update changelog * Fix docstring and warning messages * Resolve PR feedback. Adds quiet promotion of bf16 data type to fp32 for cuml and scipy, since that's not natively supported. * Fix type annotations * Darn linter got me * Reduce cuml version requirement * Disable cpu compile test * I disabled the wrong CPU test ...
* Use torchinfo instead of manual counting * Add training example for transolver on drivaerml surface data. * Minor fixes, make fp32 default * Add inference scripts for transolver example. * Update changelog for transolver example
* add capability to install torch scatter from custom wheel * typos
Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
* Use original `__init__` signature instead of reconstructing it Signed-off-by: giprayogo <genki016@gmail.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
* Move xarry and zarr to optional deps * Update pyproject.toml to move cuda perf options to a group, data libraries out of core deps. * Update group info for h5py * Update tests to protect against missing packages, or if a version is too low. Add tensordict to the datapipe-extras group since it has support for several datapipes * Re-purge fcn mip plugin * Skip domino datapipe tests without zarr * Fix another zarr import error * Hopefully fix data pipe tests. * Skip one distributed test that is acting up
…#1418) Signed-off-by: Charlelie Laurent <claurent@nvidia.com>
* Adds cu12 and cu13 extras. * docs * docs
Collaborator
Author
|
/blossom-ci |
* fixed grid effect * model cleanup * model cleanup --------- Co-authored-by: Oliver Hennigh <ohennigh@login-eos01.eos.clusters.nvidia.com>
* update license headers- second try * fix inference bug * formatting * Update d3plot_reader.py
coreyjadams
approved these changes
Feb 17, 2026
Collaborator
coreyjadams
left a comment
There was a problem hiding this comment.
Considering this is heading for deprecation, I'll approve as long as we're not breaking CI.
* Fix missing num_steps parameter for stochastic sampler - Add missing num_steps=cfg.sampler.num_steps parameter to stochastic_sampler partial() call - This bug caused stochastic sampler to always use default 18 steps instead of configured num_steps - Fixes inconsistency with deterministic sampler which correctly passes num_steps parameter - Improves performance when using fewer diffusion steps (e.g., num_steps=2) * Format code with pre-commit hooks - Applied ruff formatting to generate.py - Ensures code meets project style standards --------- Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>
* This fixes the view and unsqueeze operations for shard tensors. We now explicitly apply both of those operations, to make sure the operations work at both the dispatch and functional level. * Address inconsistent handling of view and reshape. * Remove fall back path, that was not smart. The _real_ answer was to actually import and use the wrappers. * Add more tests and coverage for view ops. * Undo normalization changes * Update for review comments and fix tests
* Fix kernel size handling in partial_na2d * docstring
* Fix the domino config path bug. Using OmegaConf directly to resolve if something is a config, rather than duck typing. * Fixing the automatic detection of cuml for domino, and an import path. * This should fix the last issues with domino. * Fix model snapshotting. --------- Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
* Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Refactor (#1208) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Add FIGConvNet to crash example (#1207) * Add FIGConvNet to crash example. * Add FIGConvNet to crash example * Update model config * propose fix some typos (#1209) Signed-off-by: John E <jeis4wpi@outlook.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. --------- Signed-off-by: John E <jeis4wpi@outlook.com> Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com> Co-authored-by: John Eismeier <42679190+jeis4wpi@users.noreply.github.com> * Unmigrate the insolation utils (#1211) * unmigrate the insolation utils * Revert test and compat map * Update importlinter * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Refactor (#1216) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Update activations path in dlwp tests (#1217) * Update activations path in dlwp tests * Update example paths * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Refactor (#1224) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * Refactor (#1231) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com> Co-authored-by: Deepak Akhare <dakhare@nvidia.com> Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * update import paths * Starting to clean up dependency tree. * Refactor (#1233) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Fixed minor bug in shape validation in SongUNet (#1230) Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Add Zarr reader for Crash (#1228) * Add Zarr reader for Crash * Update README * Update validation logic of point data in Zarr reader Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add a test for 2D feature arrays * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com> Co-authored-by: Deepak Akhare <dakhare@nvidia.com> Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> * Added coding standards for model implementations as a custom context for greptile (#1219) * Added initial set of coding standards for model implementations Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed typos + review comments + added details Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added more rules for models Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added model rules to PR checklist Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added cusror rules for models Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Linked the wiki page to the PR template Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed typo in PR checklist Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Refactor (#1234) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Fixed minor bug in shape validation in SongUNet (#1230) Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Add Zarr reader for Crash (#1228) * Add Zarr reader for Crash * Update README * Update validation logic of point data in Zarr reader Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add a test for 2D feature arrays * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Add AR RT and OT schemes to Crash FIGConvNet (#1232) * Add AR and OT schemes for FIGConvNet * Add tests * Soothe the linter * Fix the tests * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com> Co-authored-by: Deepak Akhare <dakhare@nvidia.com> Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com> * Not seeing any errors in testing ... * Breakdown of rules into smaller rules (#1236) * Breakdown of rules into smaller rules Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fix mismatches in rule IDs referenced in rule text Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactor (#1240) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * Formatting active learning module docstrings (#1238) * docs: fixing Protocol class reference formatting Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> * docs: removing mermaid diagram from protocols Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> * docs: adding active learning index * docs: revising docstrings for sphinx formatting * docs: fix placeholder URL for active learning main docs --------- Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> --------- Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Kelvin Lee <kin.long.kelvin.lee@gmail.com> * Refactor (#1247) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * A new X-MeshGraphNet example for reservoir simulation. (#1186) * X-MGN for reservoir simulation Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * installation bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * more well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve path_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix while space in config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix version inconsistency in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add versions for some libs in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetiem in mlflow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * formatting Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor loop Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * grad accum bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * total loss bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * added some safe guard for connection indexing Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup * cleanup * update configs * Update README.md style guide rule changes * Update README.md * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve docstring fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update license yr Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup well Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cimprove infrence fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetime Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve train.py fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve requirement Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * ilcense header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * license header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder (parallel) + added results to readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * delete some unsed files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * address PR comments Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference grdecl header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * support time series Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor update Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace pickle with json Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add license headers Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unused png files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unsed import Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove emojis Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace print with logger Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update docstring Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor updates Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Add knn to autodoc table. (#1244) --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: tonishi-nv <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Refactor (#1249) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * A new X-MeshGraphNet example for reservoir simulation. (#1186) * X-MGN for reservoir simulation Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * installation bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * more well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve path_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix while space in config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix version inconsistency in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add versions for some libs in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetiem in mlflow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * formatting Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor loop Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * grad accum bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * total loss bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * added some safe guard for connection indexing Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup * cleanup * update configs * Update README.md style guide rule changes * Update README.md * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve docstring fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update license yr Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup well Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cimprove infrence fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetime Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve train.py fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve requirement Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * ilcense header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * license header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder (parallel) + added results to readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * delete some unsed files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * address PR comments Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference grdecl header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * support time series Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor update Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace pickle with json Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add license headers Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unused png files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unsed import Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove emojis Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace print with logger Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update docstring Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor updates Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Add knn to autodoc table. (#1244) * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: tonishi-nv <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Automated model registry (#1252) * Deleted RegistreableModule Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed 'PhysicsNeMo' suffix in Module.from_torch method Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Implemented automatic registration for Module subclasses Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed unused name Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Metadata name deprecation (#1257) * Initiated deprecation of field 'name' in ModelMetaData Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed all occurences of 'name' field in ModelMetaData Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactor (#1258) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * A new X-MeshGraphNet example for reservoir simulation. (#1186) * X-MGN for reservoir simulation Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * installation bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * more well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve path_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix while space in config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix version inconsistency in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add versions for some libs in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetiem in mlflow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * formatting Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor loop Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * grad accum bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * total loss bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * added some safe guard for connection indexing Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup * cleanup * update configs * Update README.md style guide rule changes * Update README.md * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve docstring fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update license yr Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup well Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cimprove infrence fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetime Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve train.py fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve requirement Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * ilcense header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * license header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder (parallel) + added results to readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * delete some unsed files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * address PR comments Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference grdecl header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * support time series Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor update Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace pickle with json Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add license headers Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unused png files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unsed import Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove emojis Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace print with logger Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update docstring Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor updates Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Add knn to autodoc table. (#1244) * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Update version (#1193) * Fix depenedncies to enable hello world (#1195) * Remove zero-len arrays from test dataset (#1198) * Merge updates to Gray Scott example (#1239) * Remove pyevtk * update dependency * update dimensions * ci issues * Interpolation model example (#1149) * Temporal interpolation training recipe * Add README * Docs changes based on comments * Update docstrings and README * Add temporal interpolation animation * Add animation link * Add shape check in loss * Updates of configs + trainer * Update config comments * Update README.md style guide edits * Added wandb logging Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Reformated sections in docstring for GeometricL2Loss Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Update README and configs * README changes + type hint fixes * Update README.md * Draft of validation script * Update validation and README * Fixed command in README.md for temporal_interpolation example Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed unused import in datapipe/climate_interp.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated license headers in temporal_interpolation example Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Renamed methods to avoid implicit shadowing in Trainer class Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Cosmetic changes in train.py and removed unused import in validate.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added clamp in validate.py to make sure step does not go out of bounds Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added the temporal_interpolation example to the docs + updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Addressing remaining comments * Merged two data source classes in climate_interp.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> * update versions --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: tonishi-nv <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Jussi Leinonen <jleinonen@nvidia.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com> * Remove IPDB * Few more dep fixes. * Refactor (#1261) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Remove IPDB * Few more dep fixes. * Enhance checkpoint configuration for DLWP Healpix and GraphCast (#1253) * feat(weather): Improve configuration for DLWP Healpix and GraphCast examples - Added configurable checkpoint directory to DLWP Healpix config and training script. - Implemented Trainer logic to use specific checkpoint directory. - Updated utils.py to respect exact checkpoint path. - Made Weights & Biases entity and project configurable in GraphCast example. * fix(dlwp_healpix): remove deprecated configs - Removed the deprecated `verbose` parameter from the `CosineAnnealingLR` configuration in DLWP HEALPix, which was causing a TypeError. - Removed unused configs from examples/weather/dlwp_healpix/ * Transolver volume (#1242) * Implement transolver ++ physics attention * Enable ++ in Transolver. * Fix temperature correction terms. * Starting work adapting the domino datapipe techniques to transolver. * Working towards transolver volume training by mergeing with domino dataset. Surface dataloading is prototyped, not finished yet. * Updating * Remove printout * Enable transolver for volumetric data * Update transolver training script to support either surface or volume data. Applied some cleanup to make the datapipe similar to domino, which is a step towards unification. * Updating datapipe * Tweak transolver volume configs * Add transolverX model * Enable nearly-uniform sampling of very very large arrays * limit benchmarking to train epoch, enable profiler in config * Update volume config slightly * Update training scripts to properly enable data preloading * Working towards adding a muon optimzier in transolver * Add peter's implementation of muon with a combined optimizer. switch to a flat LR. * Add updated inference script that can also calculate drag and lift * Add better docstrings for typhon * Move typhon to experimental * Move forwards docstring * Adding typhon model and configs. * Update readme. * Update * Remove extra model. Update recipes. * Update cae_dataset.py Implement abstract methods in base classes. * Update Physics_Attention.py Ensure plus parameter is passed to base class. * Update test_mesh_datapipe.py Update import path for mesh datapipe. * Fix ruff issues --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Dileep Ranganathan <8152399+dran-dev@users.noreply.github.com> * Add external import coding standards. * Update external import standards. * Ensure vtk functions are protected. * Protect pyvista import * Closing more import gaps * Remove DGL from meshgraphkan * All models now comply with external import linting. * Remove DGL datapipes * cae datapipes in compliance * Update pyproject.toml * Add version numbers to deps * Refactor (#1261) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect externa…
* utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Refactor (#1208) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Add FIGConvNet to crash example (#1207) * Add FIGConvNet to crash example. * Add FIGConvNet to crash example * Update model config * propose fix some typos (#1209) Signed-off-by: John E <jeis4wpi@outlook.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. --------- Signed-off-by: John E <jeis4wpi@outlook.com> Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com> Co-authored-by: John Eismeier <42679190+jeis4wpi@users.noreply.github.com> * Unmigrate the insolation utils (#1211) * unmigrate the insolation utils * Revert test and compat map * Update importlinter * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Refactor (#1216) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Update activations path in dlwp tests (#1217) * Update activations path in dlwp tests * Update example paths * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Refactor (#1224) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * Refactor (#1231) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com> Co-authored-by: Deepak Akhare <dakhare@nvidia.com> Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * update import paths * Starting to clean up dependency tree. * Refactor (#1233) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Fixed minor bug in shape validation in SongUNet (#1230) Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Add Zarr reader for Crash (#1228) * Add Zarr reader for Crash * Update README * Update validation logic of point data in Zarr reader Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add a test for 2D feature arrays * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com> Co-authored-by: Deepak Akhare <dakhare@nvidia.com> Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> * Added coding standards for model implementations as a custom context for greptile (#1219) * Added initial set of coding standards for model implementations Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed typos + review comments + added details Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added more rules for models Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added model rules to PR checklist Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added cusror rules for models Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Linked the wiki page to the PR template Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed typo in PR checklist Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Refactor (#1234) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Update crash readme (#1212) * update license headers- second try * update readme * Bump multi-storage-client to v0.33.0 with rust client (#1156) * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Add jaxtyping to requirements.txt for crash sample (#1218) * update license headers- second try * Update requirements.txt * Updating to address some test issues * Replace 'License' link with 'Dev blog' link (#1215) Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Validation fu added to examples/structural_mechanics/crash/train.py (#1204) * validation added: works for multi-node job. * rename and rearrange validation function * validate_every_n_epochs, save_ckpt_every_n_epochs added in config * corrected bug (args of model) in inference * args in validation code updated * val path added and args name changed * validation split added -> write_vtp=False * fixed inference bug * bug fix: write_vtp * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Add saikrishnanc-nv to github actors (#1225) * Integrate Curator instructions to the Crash example (#1213) * Integrate Curator instructions * Update docs * Formatting changes * Adding code of conduct (#1214) * Adding code of conduct Adopting the code of conduct from the https://www.contributor-covenant.org/ * Update CODE_OF_CONDUCT.MD Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Create .markdownlintignore * Revise README for PhysicsNeMo resources and guidance Updated the 'Getting Started' section and added new resources for learning AI Physics. * Update README.md --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Corey adams <6619961+coreyjadams@users.noreply.github.com> * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Fixed minor bug in shape validation in SongUNet (#1230) Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Add Zarr reader for Crash (#1228) * Add Zarr reader for Crash * Update README * Update validation logic of point data in Zarr reader Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add a test for 2D feature arrays * Update examples/structural_mechanics/crash/zarr_reader.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Add AR RT and OT schemes to Crash FIGConvNet (#1232) * Add AR and OT schemes for FIGConvNet * Add tests * Soothe the linter * Fix the tests * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Yongming Ding <yongmingd@nvidia.com> Co-authored-by: ram-cherukuri <104155145+ram-cherukuri@users.noreply.github.com> Co-authored-by: Deepak Akhare <dakhare@nvidia.com> Co-authored-by: Sai Krishnan Chandrasekar <157182662+saikrishnanc-nv@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com> * Not seeing any errors in testing ... * Breakdown of rules into smaller rules (#1236) * Breakdown of rules into smaller rules Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fix mismatches in rule IDs referenced in rule text Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactor (#1240) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * Formatting active learning module docstrings (#1238) * docs: fixing Protocol class reference formatting Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> * docs: removing mermaid diagram from protocols Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> * docs: adding active learning index * docs: revising docstrings for sphinx formatting * docs: fix placeholder URL for active learning main docs --------- Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> --------- Signed-off-by: Kelvin Lee <kinlongkelvi@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Kelvin Lee <kin.long.kelvin.lee@gmail.com> * Refactor (#1247) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * A new X-MeshGraphNet example for reservoir simulation. (#1186) * X-MGN for reservoir simulation Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * installation bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * more well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve path_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix while space in config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix version inconsistency in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add versions for some libs in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetiem in mlflow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * formatting Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor loop Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * grad accum bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * total loss bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * added some safe guard for connection indexing Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup * cleanup * update configs * Update README.md style guide rule changes * Update README.md * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve docstring fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update license yr Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup well Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cimprove infrence fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetime Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve train.py fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve requirement Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * ilcense header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * license header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder (parallel) + added results to readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * delete some unsed files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * address PR comments Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference grdecl header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * support time series Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor update Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace pickle with json Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add license headers Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unused png files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unsed import Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove emojis Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace print with logger Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update docstring Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor updates Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Add knn to autodoc table. (#1244) --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: tonishi-nv <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Refactor (#1249) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * A new X-MeshGraphNet example for reservoir simulation. (#1186) * X-MGN for reservoir simulation Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * installation bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * more well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve path_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix while space in config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix version inconsistency in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add versions for some libs in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetiem in mlflow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * formatting Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor loop Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * grad accum bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * total loss bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * added some safe guard for connection indexing Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup * cleanup * update configs * Update README.md style guide rule changes * Update README.md * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve docstring fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update license yr Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup well Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cimprove infrence fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetime Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve train.py fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve requirement Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * ilcense header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * license header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder (parallel) + added results to readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * delete some unsed files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * address PR comments Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference grdecl header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * support time series Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor update Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace pickle with json Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add license headers Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unused png files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unsed import Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove emojis Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace print with logger Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update docstring Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor updates Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Add knn to autodoc table. (#1244) * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: tonishi-nv <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Automated model registry (#1252) * Deleted RegistreableModule Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed 'PhysicsNeMo' suffix in Module.from_torch method Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Implemented automatic registration for Module subclasses Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed unused name Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Metadata name deprecation (#1257) * Initiated deprecation of field 'name' in ModelMetaData Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed all occurences of 'name' field in ModelMetaData Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactor (#1258) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * A new X-MeshGraphNet example for reservoir simulation. (#1186) * X-MGN for reservoir simulation Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * installation bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * more well object docstring fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve path_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix while space in config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fix version inconsistency in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add versions for some libs in requirement.txt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve mldlow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetiem in mlflow_utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve exception handling in inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * formatting Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve preprocessor loop Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * grad accum bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * total loss bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * added some safe guard for connection indexing Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * bug fix Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup utils Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup * cleanup * update configs * Update README.md style guide rule changes * Update README.md * fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve docstring fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update license yr Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup well Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup preproc fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cimprove infrence fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve datetime Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve train.py fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme fmt Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve requirement Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * ilcense header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve ecl reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * cleanup Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * license header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder (parallel) + added results to readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * delete some unsed files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * address PR comments Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve inference grdecl header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * support time series Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update config Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor update Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * improve graph builder Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update ecl_reader logging Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace pickle with json Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * add license headers Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unused png files Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove unsed import Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * remove emojis Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * replace print with logger Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update docstring Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * minor updates Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update readme Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> * update header Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> * Add knn to autodoc table. (#1244) * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Update version (#1193) * Fix depenedncies to enable hello world (#1195) * Remove zero-len arrays from test dataset (#1198) * Merge updates to Gray Scott example (#1239) * Remove pyevtk * update dependency * update dimensions * ci issues * Interpolation model example (#1149) * Temporal interpolation training recipe * Add README * Docs changes based on comments * Update docstrings and README * Add temporal interpolation animation * Add animation link * Add shape check in loss * Updates of configs + trainer * Update config comments * Update README.md style guide edits * Added wandb logging Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Reformated sections in docstring for GeometricL2Loss Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Update README and configs * README changes + type hint fixes * Update README.md * Draft of validation script * Update validation and README * Fixed command in README.md for temporal_interpolation example Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed unused import in datapipe/climate_interp.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated license headers in temporal_interpolation example Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Renamed methods to avoid implicit shadowing in Trainer class Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Cosmetic changes in train.py and removed unused import in validate.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added clamp in validate.py to make sure step does not go out of bounds Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added the temporal_interpolation example to the docs + updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Addressing remaining comments * Merged two data source classes in climate_interp.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> * update versions --------- Signed-off-by: Tsubasa Onishi <tonishi@nvidia.com> Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: tonishi-nv <tonishi@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Jussi Leinonen <jleinonen@nvidia.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com> * Remove IPDB * Few more dep fixes. * Refactor (#1261) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Remove IPDB * Few more dep fixes. * Enhance checkpoint configuration for DLWP Healpix and GraphCast (#1253) * feat(weather): Improve configuration for DLWP Healpix and GraphCast examples - Added configurable checkpoint directory to DLWP Healpix config and training script. - Implemented Trainer logic to use specific checkpoint directory. - Updated utils.py to respect exact checkpoint path. - Made Weights & Biases entity and project configurable in GraphCast example. * fix(dlwp_healpix): remove deprecated configs - Removed the deprecated `verbose` parameter from the `CosineAnnealingLR` configuration in DLWP HEALPix, which was causing a TypeError. - Removed unused configs from examples/weather/dlwp_healpix/ * Transolver volume (#1242) * Implement transolver ++ physics attention * Enable ++ in Transolver. * Fix temperature correction terms. * Starting work adapting the domino datapipe techniques to transolver. * Working towards transolver volume training by mergeing with domino dataset. Surface dataloading is prototyped, not finished yet. * Updating * Remove printout * Enable transolver for volumetric data * Update transolver training script to support either surface or volume data. Applied some cleanup to make the datapipe similar to domino, which is a step towards unification. * Updating datapipe * Tweak transolver volume configs * Add transolverX model * Enable nearly-uniform sampling of very very large arrays * limit benchmarking to train epoch, enable profiler in config * Update volume config slightly * Update training scripts to properly enable data preloading * Working towards adding a muon optimzier in transolver * Add peter's implementation of muon with a combined optimizer. switch to a flat LR. * Add updated inference script that can also calculate drag and lift * Add better docstrings for typhon * Move typhon to experimental * Move forwards docstring * Adding typhon model and configs. * Update readme. * Update * Remove extra model. Update recipes. * Update cae_dataset.py Implement abstract methods in base classes. * Update Physics_Attention.py Ensure plus parameter is passed to base class. * Update test_mesh_datapipe.py Update import path for mesh datapipe. * Fix ruff issues --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Dileep Ranganathan <8152399+dran-dev@users.noreply.github.com> * Add external import coding standards. * Update external import standards. * Ensure vtk functions are protected. * Protect pyvista import * Closing more import gaps * Remove DGL from meshgraphkan * All models now comply with external import linting. * Remove DGL datapipes * cae datapipes in compliance * Update pyproject.toml * Add version numbers to deps * Refactor (#1261) * Move filesystems and version_check to core * Fix version check tests * Reorganize distributed, domain_parallel, and begin nn / utils cleanup. * Move modules and meta to core. Move registry to core. No tests fixed yet. * Add missing init files * Update build system and specify some deps. * Reorganize tests. * Update init files * Clean up neighbor tools. * Update testing * Fix compat tests * Move core model tests to tests/core/ * Add import lint config * Relocate layers * Move graphcast utils into model directory * Relocating util functionalities. * Further clean up and organize tests. * utils tests are passing now * Cleaning up distributed tests * Patching tests working again in nn * Fix sdf test * Fix zenith angle tests * Some organization of tests. Checkpoints is moved into utils. * Remove launch.utils and launch.config. Checkpointing is moved to phsyicsnemo.utils, launch.config is just gone. It was empty. * Most nn tests are passing * Further cleanup. Getting there! * Remove constants file * Add import linting to pre-commit. * Move gnn layers and start to fix several model tests. * AFNO is now passing. * Rnn models passing. * Fix improt * Healpix tests are working * Domino and unet working * Updating to address some test issues * MGN tests passing again * Most graphcast tests passing again * Move nd conv layers. * update fengwu and pangu * Update sfno and pix2pix test * update tests for figconvnet, swinrnn, superresnet * updating more models to pass * Update distributed tests, now passing. * Domain parallel tests now passing. * Fix active learning imports so tests pass in refactor * Fix some metric imports * Remove deploy package * Remove unused test file * unmigrate these files ... again? * Update import linter. * Cleaning up diffusion models. Not quite done yet. * Restore deleted files * Updating more tests. * Further updates to tests. Datapipes almost working. * update import paths * Starting to clean up dependency tree. * Fixing and adjusting a broad suite of tests. * Update test/domain_parallel/conftest.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Minor fix * Not seeing any errors in testing ... * Enable import linting on internal imports. * Remove ensure_available function, it's confusing * Add logging imports to utils, and fix imports in examples. * Update imports in minimal examples * Update structural mechanics examples * Update import paths: reservoir_sim * Update import paths: additive manufacturing * Update import paths: topodiff * Update import paths: weather part 1 * Update import paths: weather part 2 * Update import paths: molecular dynamics * Update import paths: geophysics * Update import paths: cfd + external_aero 1 * Update import paths: cfd + external_aero 2 * Remove more DGL examples * Remove more DGL examples * cfd examples 3 * Last batch of example import fixes! * Enforce and protect external deps in utils. * Remove DGL. :party: * Don't force models yet * Remove IPDB * Few more dep fixes. * Enhance checkpoint configuration for DLWP Healpix and GraphCast (#1253) * feat(weather): Improve configuration for DLWP Healpix and GraphCast examples - Added configurable checkpoint directory to DLWP Healpix config and training script. - Implemented Trainer logic to us…
* Refactor nn modules and functionals * Sync conv layer typing * Restore CODEOWNERS formatting * Remove top-level nn module duplicates * Reorder HEALPix padding helpers * Fix attention layers and docs * Fix sdf import path in datapipes * fixed issue * merging stuff * current * asv * bit more * make inputs * warp interpolation * merging * merging * almost done * fixed imports * removed warp check as its depen now * fixed uit test * updated license: * updated license: * fixed unit test * blaa
* Fix sharded Group Norm. Now works with uneven shardings, and has better numerical accuracy. Also tested with mixed precision. * Update for review comments * Update physicsnemo/domain_parallel/shard_utils/normalization_patches.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * fix precommit --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>
…ially (NVIDIA#1426) consolidates the argument parsing for all the operatiosn.
* Switches from ["_cache"] subkey to a proper _cache TensorDict, which properly isolates public and private methods. * formatting * removes unused import * simplifies mesh repr; no need for exclude_cache * reverts modernization, as this causes errors on older pyvista
* modernize docker builds * lock cupy-cuda13x version * remove uv cache to save on space for deployment image * address review feedback * Update Dockerfile * Update torch-harmonics version and install options * Fix NATTEN_CUDA_ARCH environment variable syntax * some fixes to natten builds, torch harmonics and misc
Collaborator
Author
|
/blossom-ci |
Collaborator
|
FYI we should target RC branch |
Collaborator
Author
|
/blossom-ci |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PhysicsNeMo Pull Request
Description
This PR makes a handful of documentation and argument typing changes in order to better fit model implementation coding standards. We make the conscience choice not to move the resnet block or unet skip connection block because they are mostly model specific and it would require too many changes to upstream those blocks.
Checklist
Dependencies
No new dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.