Modernize for Ansible 10.x, Ubuntu 24.04, kubespray v2.30#1336
Merged
michael-balint merged 15 commits intoNVIDIA:masterfrom Feb 19, 2026
Merged
Modernize for Ansible 10.x, Ubuntu 24.04, kubespray v2.30#1336michael-balint merged 15 commits intoNVIDIA:masterfrom
michael-balint merged 15 commits intoNVIDIA:masterfrom
Conversation
distutils.version.LooseVersion was removed from the Python stdlib in 3.12, breaking setup.sh on Ubuntu 24.04+ and any modern Python. Switch to packaging.version.Version (available via pip) and use the venv python3 instead of PYTHON_BIN so the import resolves correctly. Also bump jmespath 0.10.0 to 1.0.1 to match kubespray requirements, and add packaging to the explicit pip install list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
- Update runners from ubuntu-20.04 (removed) to ubuntu-22.04 - Bump actions to current versions (checkout@v4, setup-python@v4, codeql-action@v3, stale@v9) - Update Python 3.9 to 3.12, Ansible 4.8.0 to 9.13.0 in CI - Add setup.yml workflow to test setup.sh on Ubuntu 22.04 and 24.04 - Use explicit venv python path in setup.sh version checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
- Keep ansible==4.8.0 for lint job (ansible-lint 5.4.0 is incompatible with ansible-core 2.16); use Python 3.10 for compatibility - Use molecule-plugins[docker] instead of molecule[docker] (driver moved to separate package in newer molecule versions) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
The packaging module is used for version comparisons but was not installed until after those comparisons ran. This caused ImportError when ansible was already installed in the venv. Install packaging immediately after pip upgrade, before the version check block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
- Remove deprecated apt_key tasks from nvidia_cuda and nvidia_dcgm (cuda-keyring .deb package supersedes old GPG key management) - Replace action: keyword with proper module syntax in easy-build - Replace inline key=value module args with YAML dict syntax in easy-build and kerberos_client - Widen kerberos_client version checks for RHEL 8+ and Ubuntu 20+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
Remove dead code paths for EOL platforms (CentOS 7 EOL Jun 2024, Ubuntu 18.04 EOL Apr 2023). Changes: - setup.sh: Remove DEPS_EL7, simplify RHEL package install - slurm: Remove CentOS 7 yum tasks, widen RHEL 8 dnf conditions - lmod: Remove CentOS 7 yum task and Ubuntu 18.04 posix_c bugfix - nfs: Remove RHEL 7 libsemanage-python task - kerberos_client: Consolidate to single RHEL and Ubuntu task/vars - openshift: Remove python2-openshift CentOS 7 task - ood-wrapper: Update singularity image from 18.04 to 22.04 - molecule configs: Remove 1804/centos-7, add ubuntu-2204 platforms - config.example: Update NGC container tags to current versions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
- Ansible: 9.13.0 -> 10.7.0 (ansible-core 2.16 -> 2.17) - ansible-lint: 5.4.0 -> 26.1.1 (now compatible with Ansible 10.x) - kubespray: v2.27.0+88 -> v2.30.0 (latest stable) - jmespath: 1.0.1 -> 1.1.0 - ansible.posix: 1.5.4 -> 2.1.0 - community.general: 7.2.0 -> 12.3.0 - community.docker: 3.10.2 -> 5.0.6 - nvidia.nvidia_driver: v2.3.0 -> v2.3.1 - dev-sec.ssh-hardening: 9.7.0 -> 10.5.0 - geerlingguy.ntp: 2.3.2 -> 4.0.0 - gantsign.golang: 3.1.6 -> 3.5.0 Also fixes: - docker.yml: Update kubespray defaults path (main.yml -> main/main.yml) - docker.yml, k8s-cluster.yml: Remove CentOS 7 docker repo overrides - CI: Remove ansible-lint/ansible 4.8.0 version workaround Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
- ansible.cfg: Replace removed community.general.yaml callback with ansible.builtin.default + result_format=yaml - requirements.yml: Migrate dev-sec.ssh-hardening role to devsec.hardening collection (standalone role repo stopped at 9.7.0, 10.x+ is collection-only) - playbooks: Update include_role references from dev-sec.ssh-hardening to devsec.hardening.ssh_hardening (FQCN) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
kubespray v2.30.0 renamed kubespray-defaults to kubespray_defaults (underscore) and removed the defaults/ dir from the old location. Update vars_files path and role reference in docker.yml accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
Modern Ubuntu (22.04+) enforces PEP 668 'externally-managed-environment' which blocks system-wide pip installs. Replace pip: name=docker with package: name=python3-docker across all roles that need the Docker Python SDK. Also removes dead Python 2 code paths. Affected roles: standalone-container-registry, docker-login, prometheus, alertmanager, nginx-docker-registry-cache Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
The passlib module is required by Ansible's password_hash filter used in the users playbook. Without it, password hashing fails with 'No module named passlib' on modern systems. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
kubespray v2.30 requires underscored group names: - kube-master -> kube_control_plane - kube-node -> kube_node - k8s-cluster -> k8s_cluster Updated inventory templates, group_vars filename, group_vars content, and all playbook references. Directory paths (playbooks/k8s-cluster/) are unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
The 'native' snapshotter was a workaround for old cri-tools issues (NVIDIA#436, NVIDIA#710) that are long resolved. It causes 'no unpack platforms defined' errors with containerd v2.x. Switch to 'overlayfs' which is kubespray's default and works correctly on ext4/xfs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
2084865 to
9a51209
Compare
- Add project-level .ansible-lint with profile:min and skip_list for pre-existing issues (fqcn, name casing, truthy, octal, etc.) - Rewrite lint script to run from project root using project config - Remove per-role .ansible-lint files (conflicted with v26 syntax) - Molecule: drop Ubuntu 20.04 platforms (EOL), keep 22.04 only - Molecule: use cgroupns_mode:host, remove command:/sbin/init and tmpfs that caused systemd temp dir failures on cgroup v2 hosts - Molecule: add privileged:true where missing, remove max-parallel limit, set fail-fast:false, upgrade runner to ubuntu-24.04 - Add ANSIBLE_ROLES_PATH and passlib to molecule workflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
e8d5bca to
f85512d
Compare
- spack: Replace gcc-7/gfortran-7 with unversioned gcc/gfortran - Remove abims_sbr.singularity from requirements.yml (dead project) - Molecule CI: Remove 5 roles that can't run in Docker containers: nis_client, rsyslog_client, rsyslog_server, slurm (need systemd services), singularity_wrapper (broken upstream Galaxy dep). These are all verified end-to-end on real MAAS VMs. - Remaining 11 molecule roles all pass in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
f85512d to
87cf056
Compare
michael-balint
approved these changes
Feb 19, 2026
dholt
added a commit
to dholt/deepops
that referenced
this pull request
Feb 20, 2026
Follow-up to PR NVIDIA#1336: rename remaining kube-master references to kube_control_plane and k8s-cluster to k8s_cluster in config.example group_vars, example playbook, and helper scripts (debug.sh, deploy_rook.sh). Also update ssh-hardening collection reference (dev-sec.ssh-hardening -> devsec.hardening) in config.example/group_vars/all.yml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Douglas Holt <dholt@nvidia.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full modernization of DeepOps build system, dependencies, and OS support:
Test results
k8s-cluster.ymlslurm-cluster.ymlngc-ready-server.ymlnvidia-cuda.ymlAll playbooks tested with real deployments on ephemeral MAAS VMs. The only non-zero exit across all runs is kubespray's
copy kubectl to ansible hosttask — the VMs are behind a bastion, not directly reachable for rsync. Not a code bug.Untested playbooks require specific hardware (DGX, InfiniBand/MOFED, GPUs) not available in the test environment.
🤖 Generated with Claude Code