Problem compiling deepmd 3.1.2 from cuda 12.9 #5195

qchempku2017 · 2026-02-06T02:56:38Z

qchempku2017
Feb 6, 2026

Dear community,
Recently I'm trying to install deepmd-kit 3.1.2 from source using cuda 12.9. The compilation process seems to pass, but when running a test finetuning process to DPA-3 forcefield, I encountered the following error:

[2026-02-06 10:36:22,531] DEEPMD INFO val/mp-2412-GGA 24 6 2 1.504e-03 T
[2026-02-06 10:36:22,531] DEEPMD INFO val/mp-555915-GGA 72 2 3 2.256e-03 T
[2026-02-06 10:36:22,531] DEEPMD INFO val/mp-674158-GGA 1 128 1 7.519e-04 T
[2026-02-06 10:36:22,531] DEEPMD INFO val/mp-83-GGA 56 3 3 2.256e-03 T
[2026-02-06 10:36:22,531] DEEPMD INFO --------------------------------------------------------------------------------------
[2026-02-06 10:36:23,191] DEEPMD INFO Start to train 100 steps.
Traceback (most recent call last):
File "/app/softwares/deepmd/3.1.2-cu129/venv/bin/dp", line 6, in
sys.exit(main())
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/main.py", line 1020, in main
deepmd_main(args)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 362, in wrapper
return f(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 536, in main
train(
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 367, in train
trainer.run()
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/train/training.py", line 1132, in run
step(step_id)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/train/training.py", line 772, in step
model_pred, loss, more_loss = self.wrapper(
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/train/wrapper.py", line 193, in forward
model_pred, loss, more_loss = self.loss[task_key](
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/loss/ener.py", line 186, in forward
model_pred = model(**input_dict)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/model/model/ener_model.py", line 103, in forward
model_ret = self.forward_common(
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/model/model/make_model.py", line 175, in forward_common
) = extend_input_and_build_neighbor_list(
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/utils/nlist.py", line 29, in extend_input_and_build_neighbor_list
coord_normalized = normalize_coord(
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/utils/region.py", line 103, in normalize_coord
icoord = phys2inter(coord, cell)
File "/app/softwares/deepmd/3.1.2-cu129/venv/lib/python3.10/site-packages/deepmd/pt/utils/region.py", line 25, in phys2inter
return torch.matmul(coord, rec_cell)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasDgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)

The training setup I was using:
{ "_comment": "that's all", "model": { "type_map": [ "H", "He", "Li", "Be", "B", "C", "N", "O", "F", "Ne", "Na", "Mg", "Al", "Si", "P", "S", "Cl", "Ar", "K", "Ca", "Sc", "Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn", "Ga", "Ge", "As", "Se", "Br", "Kr", "Rb", "Sr", "Y", "Zr", "Nb", "Mo", "Tc", "Ru", "Rh", "Pd", "Ag", "Cd", "In", "Sn", "Sb", "Te", "I", "Xe", "Cs", "Ba", "La", "Ce", "Pr", "Nd", "Pm", "Sm", "Eu", "Gd", "Tb", "Dy", "Ho", "Er", "Tm", "Yb", "Lu", "Hf", "Ta", "W", "Re", "Os", "Ir", "Pt", "Au", "Hg", "Tl", "Pb", "Bi", "Po", "At", "Rn", "Fr", "Ra", "Ac", "Th", "Pa", "U", "Np", "Pu", "Am", "Cm", "Bk", "Cf", "Es", "Fm", "Md", "No", "Lr", "Rf", "Db", "Sg", "Bh", "Hs", "Mt", "Ds", "Rg", "Cn", "Nh", "Fl", "Mc", "Lv", "Ts", "Og" ], "descriptor": { "type": "dpa3", "repflow": { "n_dim": 128, "e_dim": 64, "a_dim": 32, "nlayers": 16, "e_rcut": 6.0, "e_rcut_smth": 5.3, "e_sel": 1200, "a_rcut": 4.0, "a_rcut_smth": 3.5, "a_sel": 300, "axis_neuron": 4, "fix_stat_std": 0.3, "a_compress_rate": 1, "a_compress_e_rate": 2, "a_compress_use_split": true, "update_angle": true, "smooth_edge_update": true, "use_dynamic_sel": true, "sel_reduce_factor": 10.0, "use_exp_switch": true, "update_style": "res_residual", "update_residual": 0.1, "update_residual_init": "const" }, "activation_function": "silut:3.0", "use_tebd_bias": false, "precision": "float32", "concat_output_tebd": false }, "fitting_net": { "neuron": [ 240, 240, 240 ], "dim_case_embd": 31, "resnet_dt": true, "precision": "float32", "activation_function": "silut:3.0", "seed": 1, "_comment": " that's all" }, "_comment": " that's all" }, "learning_rate": { "type": "exp", "decay_steps": 20, "start_lr": 0.001, "stop_lr": 3e-05, "_comment": "that's all" }, "loss": { "type": "ener", "start_pref_e": 0.2, "limit_pref_e": 20, "start_pref_f": 100, "limit_pref_f": 60, "start_pref_v": 0.02, "limit_pref_v": 1, "_comment": " that's all" }, "training": { "stat_file": "./dpa3.hdf5", "training_data": { "systems": [ ... ], "batch_size": "auto:128", "_comment": "that's all" }, "validation_data": { "systems": [ ... ], "batch_size": "auto:128", "_comment": "that's all" }, "numb_steps": 100, "warmup_steps": 0, "gradient_max_norm": 5.0, "seed": 10, "disp_file": "lcurve.out", "disp_freq": 100, "save_freq": 2000, "_comment": "that's all" }

My GPU: nvidia V100 32G PCIE
My nvidia driver: system-wide nvidia-driver-580-server
My nvcc: from lmod module cuda/12.9, installed from nvidia's official cuda_12.9.0_575.51.03_linux.run
My gcc: system-wide gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
My torch: installed using pip, as mentioned in your official documentation: https://docs.deepmodeling.com/projects/deepmd/en/stable/install/install-from-source.html

My deepmd-kit source code was not downloaded via git clone, but by using wget to download the tarball from your 3.1.2 github release link instead.

All of my installation commands:
`bash

DP_VARIANT=cuda CUDAToolkit_ROOT=/app/softwares/cuda/12.9/ DP_ENABLE_TENSORFLOW=1 DP_ENABLE_PYTORCH=1 pip install .
`

Could you please make a deep investigation and teach me how may I fix this issue?

Besides, if this issue is unlikely to be fixed under my current setup, I may have to try installing from the shell scripts. However, even if I have to do the shell script install or conda install, would it still be possible to make the whole deepmd-kit environment an lmod managed module (i.e., can be loaded via module load deepmd/3.1.2 and cleanly unloaded via module unload).

Much appreciated!

dosubot[bot] · 2026-02-06T03:01:45Z

dosubot[bot]
bot Feb 6, 2026

Your error is most likely caused by a mismatch or incompatibility between your CUDA/cuBLAS, PyTorch, and deepmd-kit builds, or by exceeding GPU parameter limits.

First, confirm your PyTorch version is exactly 2.10.0. DeepMD-kit 3.1.2 recently updated its requirements to strictly require PyTorch 2.10.0 for CUDA 12.9 compatibility—other versions may not work reliably and can cause runtime errors like the one you’re seeing (see PR).

If your PyTorch version is correct, try reducing your batch size and the selection parameters (e_sel, a_sel) in your model config. CUBLAS_STATUS_INVALID_VALUE errors are often triggered by excessive memory usage or by exceeding DeePMD-kit's GPU parameter limits (like total sel < 4096, number of atom types < 128, etc.) (see docs).

If the error persists, rebuild deepmd-kit after ensuring ABI compatibility with your installed PyTorch. If you built deepmd-kit with a different CXX11_ABI flag than your PyTorch runtime, you can get subtle or cryptic CUDA errors at runtime. Always build deepmd-kit in the same environment as your PyTorch install.

If you want a more robust setup, consider installing deepmd-kit via conda or the official shell script. These methods manage CUDA/cuBLAS/PyTorch dependencies more reliably than pip-from-source. Both conda environments and shell script installs can be managed as lmod modules: just create a modulefile that sets the appropriate environment variables (like PATH, LD_LIBRARY_PATH, and CONDA_PREFIX or the shell script’s install path). Loading/unloading the module will then activate/deactivate the environment cleanly.

Let me know your PyTorch version and if these steps help! If not, I can suggest further debugging steps.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

6 replies

dosubot[bot] bot Feb 6, 2026

Your PyTorch 2.10.0+cu128 should generally work with CUDA 12.9 on your system. The "cu128" in the PyTorch version means it was built against CUDA 12.8, but PyTorch wheels are designed to be forward-compatible with newer CUDA 12.x toolkits, so running cu128 on CUDA 12.9 is expected to work unless you hit a rare edge case. There are currently no official PyTorch wheels for 2.10.0 with cu129—cu128 is the latest available for that release.

The deepmd-kit shell installer labeled "cuda129" means the binary was built and tested with CUDA 12.9, but it doesn't guarantee that every bundled Python package (like PyTorch) is also built specifically for CUDA 12.9. In practice, the shell installer likely includes PyTorch 2.10.0+cu128, since that's the latest upstream wheel, and relies on CUDA's minor version compatibility to work on CUDA 12.9 systems. This is a standard practice for Python/CUDA software stacks, and is supported by NVIDIA's compatibility policy.

So, your setup (PyTorch 2.10.0+cu128, CUDA 12.9, deepmd-kit 3.1.2) is expected to be compatible, and the version labeling difference is just a result of how PyTorch and CUDA package releases are managed. If you still see runtime errors, they're more likely due to other causes (like ABI mismatches, environment issues, or parameter limits), not the cu128 vs. cuda129 difference. For more details, see the official install docs and NVIDIA's CUDA compatibility notes.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

qchempku2017 Feb 6, 2026
Author

@dosu Please give me a minimal effort installation plan that achieves my goal:
1, Need to install full deepmd-kit 3.1.2, CUDA GPU compatible version along with patched lammps.
2, Need to manage this deepmd-kit installation via lmod module management. It should be easily loaded via module load deepmd/3.1.2, and also completely unloaded via module unload, without contaminating the system variables.

Please describe the operations I should perform step by step in detail. You may refer to deepmd-kit's official documentations and github codes.

dosubot[bot] bot Feb 6, 2026

Here’s a minimal-effort, robust installation plan for deepmd-kit 3.1.2 (CUDA GPU, patched LAMMPS) with lmod module management:

1. Prerequisites

Make sure your system has a compatible NVIDIA driver and CUDA toolkit (12.x is fine for deepmd-kit 3.1.2).
Python 3.9+ is required.

2. Install via Conda (recommended for lmod/module management)

# If you don't have conda/miniforge, install it first (see https://conda-forge.org/miniforge/)
conda create -p /path/to/deepmd-kit-3.1.2-env python=3.10 deepmd-kit=3.1.2 lammps horovod -c conda-forge

Replace /path/to/deepmd-kit-3.1.2-env with your preferred install location (e.g., /app/softwares/deepmd/3.1.2).

3. (Alternative) Install via Official Shell Installer

Download both .sh.0 and .sh.1 files from the release page.

Merge and run:

cat deepmd-kit-3.1.2-cuda129-Linux-x86_64.sh.0 deepmd-kit-3.1.2-cuda129-Linux-x86_64.sh.1 > deepmd-kit-3.1.2-cuda129-Linux-x86_64.sh
bash deepmd-kit-3.1.2-cuda129-Linux-x86_64.sh -b -p /path/to/deepmd-kit-3.1.2-env

This creates a conda-like environment at your chosen path.

4. Verify Installation

conda activate /path/to/deepmd-kit-3.1.2-env
dp -h
lmp -h

Both commands should work. In LAMMPS, check that deepmd appears in the list of pair styles.

5. lmod Modulefile Example

Create a file (e.g., /path/to/lmod/modulefiles/deepmd/3.1.2) with the following content:

#%Module1.0
proc ModulesHelp { } {
    puts stderr "Loads DeePMD-kit 3.1.2 with CUDA and LAMMPS"
}
module-whatis "DeePMD-kit 3.1.2 (CUDA, LAMMPS, conda-managed)"

set root /path/to/deepmd-kit-3.1.2-env

# Activate conda environment on load
prepend-path PATH $root/bin
prepend-path LD_LIBRARY_PATH $root/lib
setenv CONDA_PREFIX $root
setenv LAMMPS_PLUGIN_PATH $root/lib/deepmd_lmp

# Optional: activate conda environment automatically (requires conda in PATH)
if {[module-info mode load]} {
    exec bash -c "source $(dirname [exec which conda])/../etc/profile.d/conda.sh && conda activate $root"
}
if {[module-info mode unload]} {
    exec bash -c "conda deactivate"
}

Adjust $root to your actual install path.
Place this file in your lmod modulefiles tree.

6. Usage

module load deepmd/3.1.2
# Now dp, lmp, and all dependencies are available
module unload deepmd/3.1.2
# Environment is cleanly restored

7. Troubleshooting

If LAMMPS can’t find the deepmd pair style, ensure the environment is activated and LAMMPS_PLUGIN_PATH is set.
Always use the lmp_mpi executable for MPI support.
If you see shared library errors, check that LD_LIBRARY_PATH includes $root/lib and $root/lib/deepmd_lmp.

This approach is robust, minimal-effort, and fully compatible with lmod/module management. For more details, see the official docs and LAMMPS integration.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

qchempku2017 Feb 6, 2026
Author

@dosu This seems to be a janky work-around. Might work, but definitely requires installing a system-level conda. Might not be good if other student accounts on the computing cluster have their own conda.

dosubot[bot] bot Feb 6, 2026

You don't need a system-level conda to install or manage deepmd-kit. The shell installer and conda environments can be created entirely in your own user or project directory using Miniconda or Miniforge—no admin rights or global conda needed, so you won't interfere with other users' setups. Just install Miniconda/Miniforge in your home, create the environment at a custom path, and point your lmod modulefile to it. This is a standard approach on shared clusters and is supported by the official docs (see here).

If you want to avoid conda entirely, you can use a Python virtual environment (venv) and install deepmd-kit with pip, e.g.:

python3 -m venv /path/to/deepmd-venv
source /path/to/deepmd-venv/bin/activate
pip install deepmd-kit[gpu,cu12,lmp,ipi]

Then, write your lmod modulefile to prepend PATH and LD_LIBRARY_PATH from your venv, and unset them on unload. This method is fully isolated and doesn't touch conda at all, but note that pip-installed LAMMPS only supports the TensorFlow backend (details).

Both methods are compatible with lmod: just set PATH, LD_LIBRARY_PATH, and (if needed) LAMMPS_PLUGIN_PATH in your modulefile, and unset them on unload. No system-level conda, no cross-user contamination.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem compiling deepmd 3.1.2 from cuda 12.9 #5195

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Problem compiling deepmd 3.1.2 from cuda 12.9 #5195

Uh oh!

qchempku2017 Feb 6, 2026

Replies: 1 comment · 6 replies

Uh oh!

dosubot[bot] bot Feb 6, 2026

Uh oh!

dosubot[bot] bot Feb 6, 2026

Uh oh!

qchempku2017 Feb 6, 2026 Author

Uh oh!

dosubot[bot] bot Feb 6, 2026

Uh oh!

qchempku2017 Feb 6, 2026 Author

Uh oh!

dosubot[bot] bot Feb 6, 2026

qchempku2017
Feb 6, 2026

Replies: 1 comment 6 replies

dosubot[bot]
bot Feb 6, 2026

qchempku2017 Feb 6, 2026
Author

qchempku2017 Feb 6, 2026
Author