When running metatomic models inside simulation engines with domain decomposition (currently mainly LAMMPS, more to come), it would be useful to give the model access to explicit communication API. This would then enable at least two things:
- using the communication for message passing in message passing models (MPNN), in turn reducing both the amount of duplicated work from one domain to another, and allowing the model to run with a smaller interaction cutoff.
- using the communication to enable domain decomposition for long-range models.
For the MPNN use-case, MACE is able to use something similar in the ML-IAP interface of LAMMPS, using LAMMPS primitives to receive/transmit node features from other domains: https://github.com/ACEsuit/mace/blob/42cd495b9c8e155441c405fef842a8ba5107c2f8/mace/tools/utils.py#L152-L167.
We should do something similar, but not limited to LAMMPS, providing models the same set of tools for communication regardless of the simulation engine.
How it could work
metatomic would provide a set of custom TorchScript operations (name to be decided, communicate here) that the model can use and call. Then the engine would give metatomic a set of corresponding function pointers that implement these operations using then engine's communication primitives.
model engine
communicate(...) ==> metatomic shim ==> communicate_impl(...)
Taking inspiration from ML-IAP, the API could look something like
# features is `n_atoms x n_features`
pre_allocated = torch.zeros_like(features)
communicate(features, pre_allocated)
# we get the information on our ghosts (i.e. atoms from other domains) inside the `pre_allocated` array
# other domain get information about their ghosts from `features`
If the engine does not support domain decomposition, the communicate function would do nothing.
We should be able to make everything works during backward propagation for the calculation of forces/gradients in general. In LAMMPS, communicate would map to forward_comm and reverse_comm for backward propagation (https://docs.lammps.org/Developer_comm_ops.html)
Unresolved questions
When running metatomic models inside simulation engines with domain decomposition (currently mainly LAMMPS, more to come), it would be useful to give the model access to explicit communication API. This would then enable at least two things:
For the MPNN use-case, MACE is able to use something similar in the ML-IAP interface of LAMMPS, using LAMMPS primitives to receive/transmit node features from other domains: https://github.com/ACEsuit/mace/blob/42cd495b9c8e155441c405fef842a8ba5107c2f8/mace/tools/utils.py#L152-L167.
We should do something similar, but not limited to LAMMPS, providing models the same set of tools for communication regardless of the simulation engine.
How it could work
metatomic would provide a set of custom TorchScript operations (name to be decided,
communicatehere) that the model can use and call. Then the engine would give metatomic a set of corresponding function pointers that implement these operations using then engine's communication primitives.Taking inspiration from ML-IAP, the API could look something like
If the engine does not support domain decomposition, the
communicatefunction would do nothing.We should be able to make everything works during backward propagation for the calculation of forces/gradients in general. In LAMMPS, communicate would map to
forward_commandreverse_commfor backward propagation (https://docs.lammps.org/Developer_comm_ops.html)Unresolved questions