Skip to content

Comprehensive CI/CD infrastructure overhaul and testing documentation framework for PROTEUS ecosystem#600

Open
timlichtenberg wants to merge 303 commits intomainfrom
tl/test_ecosystem_v5
Open

Comprehensive CI/CD infrastructure overhaul and testing documentation framework for PROTEUS ecosystem#600
timlichtenberg wants to merge 303 commits intomainfrom
tl/test_ecosystem_v5

Conversation

@timlichtenberg
Copy link
Member

@timlichtenberg timlichtenberg commented Feb 1, 2026

Description

This PR introduces a comprehensive CI/CD infrastructure overhaul and testing documentation framework for PROTEUS. The changes establish a robust, Docker-based continuous integration pipeline with automated coverage ratcheting, dual-threshold validation, and extensive documentation for AI-assisted development workflows.

Key additions:

  • Docker-based CI with pre-compiled physics modules (~60 min build time savings per PR)
  • Two-tier coverage system: fast gate (unit + smoke tests) for PRs, full gate (unit+smoke+integration+slow tests) for nightly
  • Automatic coverage ratcheting - thresholds only increase, never decrease
  • Four-tier test categorization: unit, smoke, integration, slow
  • AI development context files: AGENTS.md and MEMORY.md
  • Coverage evolution: The test coverage from this update will decrease first somewhat, but establishes a mechanism how to auto-improve ("ratchet") the test coverage with each new PR.

Closes #481

Notes:
This update is extremely comprehensive; many new additions and structural changes to the CI. I have tried a lot to verify the changes and re-reviewed and tested. However, because many of the changes work intimately with the CI workflows and work different on main compared to branches, it could very well be that it will take a few iterations until everything works properly. After some reviews we will have to merge this and simply try out if everything works in the upcoming. I will aim to timely provide new updates to fix problems. I also ruff formatted the whole codebase, which led to many incremental or formatting changes across the code. My suggestion would be to only review in detail the main files, see below.

Needs double merge:
Because the way ci-pr-checks.yml interacts with docker-build.yml the changes cannot be completed with a single merge. Instead, I will have to merge this PR first (this uses the Docker image from the branch tl/test_ecosystem_v5. Then once this is merged, I will have to update ci-pr-checks.yml to from then on refer to the docker image built on the main branch. This can only be generated once docker-build.yml can be triggered on main. Once that is done it should work continuously and automatically over night. So, one big PR (this one), followed by a smaller to just update the docker tag in ci-pr-checks.yml.

Files for Human Review

This PR contains 204 changed files. Below is a prioritized guide for reviewers.

🔴 Critical Infrastructure (Must Review)

File Purpose
.github/workflows/ci-pr-checks.yml New fast PR validation workflow
.github/workflows/ci-nightly.yml New nightly science validation workflow
.github/workflows/docker-build.yml Docker image build & push
Dockerfile Pre-built CI environment with all physics modules
pyproject.toml Coverage thresholds, test markers, dependencies
.pre-commit-config.yaml Pre-commit hooks including file size limits

🟠 New Documentation (Recommended Review)

File Purpose
docs/test_infrastructure.md CI/CD architecture and coverage strategy
docs/test_categorization.md Test markers (unit, smoke, integration, slow)
docs/test_building.md Guide for writing new tests
docs/docker_ci_architecture.md Docker-based CI deep dive
docs/ai_usage.md AI assistant guidelines for this project
AGENTS.md AI agent instructions (replaces inline prompts)
MEMORY.md Living project context document

🟡 Key Source Changes (Spot Check)

Directory Files Changed Summary
src/proteus/config/ 14 files Validator improvements, new options
src/proteus/atmos_clim/ 5 files AGNI/JANUS robustness
src/proteus/interior/ 7 files ARAGOG/SPIDER error handling
src/proteus/escape/ 3 files BOREAS module, pxuv reservoir

🟢 Test Suite (Trust CI, Spot Check)

Directory Files Changed Summary
tests/ 63 files New test structure mirroring src/
tests/integration/ 14 files Smoke tests for all module combinations
tests/conftest.py 1 file Shared fixtures and markers

⚪ Low Priority (Auto-Generated / Cosmetic)

  • examples/ - Updated example outputs
  • src/proteus/plot/ - Minor fixes
  • tools/ - Helper scripts for coverage analysis
  • Badge updates in README.md, docs/index.md

Deleted Files

File Reason
.github/workflows/tests.yaml Replaced by ci-pr-checks.yml + ci-nightly.yml

Reviewer Focus: Start with 🔴 Critical Infrastructure, then skim 🟠 Documentation. Trust CI for test correctness.

New Documentation

Document Purpose
docs/test_infrastructure.md CI/CD overview, coverage analysis, troubleshooting
docs/test_building.md How to write tests, master prompts for AI assistance
docs/test_categorization.md Test markers (unit, smoke, integration, slow)
docs/ai_usage.md AI-assisted development workflows and safety
docs/docker_ci_architecture.md Docker image strategy, workflow triggers, artifacts

CI Workflow Architecture

ci-pr-checks.yml vs ci-nightly.yml

Aspect PR Checks Nightly
Trigger PRs to main/dev, pushes Daily 03:00 UTC
Tests unit + smoke only All: unitsmokeintegrationslow
Duration ~5-10 minutes up to 4 hours (240 min timeout), currently ~30-60 min
Coverage gate Fast threshold (31.45%) Full threshold (59%)
Validation Diff-cover 80% on changed lines Full coverage ratcheting
Baseline Downloads nightly artifact Establishes baseline for PRs

proteus_test_quality_gate.yml (Reusable Workflow)

A standardized quality gate for PROTEUS ecosystem modules (CALLIOPE, JANUS, MORS, etc.). Provides:

  • Configurable Python version (default: 3.12)
  • Configurable coverage threshold with grace period
  • Automatic Codecov integration
  • HTML coverage report artifacts

Ecosystem modules can call this workflow to adopt PROTEUS testing standards without duplicating CI configuration.

AI Context Files

File Purpose Size Limit
AGENTS.md Instructions for AI coding assistants (IDE integration) 500 lines
MEMORY.md Living project context, decisions, current sprint focus 1000 lines

These files provide AI tools (Windsurf Cascade, GitHub Copilot, etc.) with project-specific context for more accurate code generation and test writing.

Validation of changes

Automated Tests Performed

  1. Ruff formatting: ruff check src/ tests/ and ruff format --check src/ tests/ pass
  2. Test structure validation: bash tools/validate_test_structure.sh confirms all test directories mirror source structure
  3. File size enforcement: AGENTS.md (469 lines) and MEMORY.md (587 lines) within limits

CI Workflow Validation

  • All branch references updated from feature branch to main
  • Docker image references use :latest tag
  • GitHub Actions versions fixed (actions/checkout@v4, actions/setup-python@v5)
  • Workflow dispatch targets corrected (ci-nightly.yml)

Test Configuration

  • OS: macOS / Ubuntu (CI)
  • Python: 3.12
  • Coverage tool: coverage + pytest-cov

Checklist

  • I have followed the contributing guidelines
  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • My changes generate no new warnings or errors
  • I have checked that the tests still pass on my computer
  • I have updated the docs, as appropriate
  • I have added tests for these changes, as appropriate
  • I have checked that all dependencies have been updated, as required

Relevant people

@FormingWorlds/proteus-maintainer @stuitje @FormingWorlds/proteus-developer
Anyone from the dev team is welcome to review this PR; it will change substantially how we interact with code development.


Summary of Changes by Category

CI/CD Infrastructure

  • .github/workflows/ci-pr-checks.yml - Fast PR feedback with unit+smoke tests
  • .github/workflows/ci-nightly.yml - Comprehensive nightly validation
  • .github/workflows/docker-build.yml - Nightly Docker image builds (02:00 UTC)
  • .github/workflows/proteus_test_quality_gate.yml - Reusable workflow for ecosystem
  • .github/workflows/code-style.yaml - Pre-commit hooks on PRs
  • .github/workflows/publish.yaml - PyPI publishing

Documentation

  • docs/test_infrastructure.md - Testing system overview
  • docs/test_building.md - Test writing guide
  • docs/test_categorization.md - Test markers and CI mapping
  • docs/ai_usage.md - AI assistant integration
  • docs/docker_ci_architecture.md - Docker CI deep dive

AI Context

  • AGENTS.md - AI assistant instructions (IDE)
  • MEMORY.md - Project context and decisions

Tools

  • tools/validate_test_structure.sh - Test directory validation
  • tools/update_coverage_threshold.py - Coverage ratcheting script
  • tools/coverage_analysis.sh - Coverage gap analysis
  • tools/check_file_sizes.sh - File size enforcement

Configuration

  • pyproject.toml - Coverage thresholds, pytest markers, ruff rules
  • .pre-commit-config.yaml - Pre-commit hooks including file size limits

Note

High Risk
High risk because it replaces core CI/CD workflows (Docker image build, PR checks, nightly runs) and introduces automated commits/coverage gating; failures here can block merges and affect release automation.

Overview
CI/CD is re-architected around a pre-built Docker image and split workflows. Adds ci-pr-checks.yml (fast unit+smoke + diff-cover) and ci-nightly.yml (scheduled full-suite validation) plus docker-build.yml to build/push ghcr.io images and trigger nightlies; removes the legacy tests.yaml pipeline.

Introduces coverage governance and enforcement. Adds dual coverage thresholds ([tool.proteus.coverage_fast] and [tool.coverage.report]), CI logic to prevent threshold decreases vs main, nightly artifacts used to estimate total coverage on PRs, and automatic threshold “ratcheting” commits via tools/update_coverage_threshold.py.

Adds contributor/agent guidance and new documentation. Introduces AGENTS.md, MEMORY.md, multiple new testing/CI docs in docs/, a coverage-improvement issue template, and a pre-commit hook to enforce size limits on the new context files.

Misc config/runtime adjustments. Adds a new Dockerfile, updates action versions in existing workflows, tweaks input/all_options.toml defaults, expands .gitignore, and applies small ruff/formatting fixes in src/proteus/atmos_chem/*.

Written by Cursor Bugbot for commit 05703b4. This will update automatically on new commits. Configure here.

…ML code block

- Convert multiline YAML block scalars to single-line format for cache keys (4 occurrences)
- Fix unclosed TOML code block in test_infrastructure.md
- Addresses review #579 (review)
Major Changes:
- Add Dockerfile with pre-compiled physics modules (SOCRATES, PETSc, SPIDER, AGNI)
- Create docker-build.yml workflow (nightly builds at 02:00 UTC)
- Create ci-pr-checks.yml workflow (fast PR validation ~10-15 min)
- Create ci-nightly-science.yml workflow (deep science validation)
- Add 'smoke' pytest marker for quick binary validation
- Add comprehensive documentation and example tests

Architecture Benefits:
- 50+ minute time savings per PR (Python changes)
- Smart rebuild: only recompile changed files
- Pre-built Docker image reused across all CI workflows
- Test stratification: unit → smoke → integration → slow
- Nightly comprehensive validation ensures scientific correctness

Test Markers:
- @pytest.mark.unit: Fast tests with mocked physics (PR checks)
- @pytest.mark.smoke: Quick binary validation (PR checks)
- @pytest.mark.integration: Multi-module tests (nightly)
- @pytest.mark.slow: Full scientific validation (nightly)
- Install Julia 1.11 specifically (required by AGNI Project.toml)
- Configure git to use HTTPS instead of SSH (avoid SSH dependency)
- Remove PETSc and SPIDER compilation (not needed for tests)
- Add test_docker_image.sh for local validation
- Image builds successfully: 3.05GB, all modules working
This allows manual testing of Docker CI/CD workflows before merging:
- docker-build.yml: Build and push image from feature branch
- ci-pr-checks.yml: Test PR checks with the built image

Will be reverted before merge to main.
- Delete tests/examples/test_marker_usage.py (13 example tests with 0% coverage)
- Mark 9 placeholder tests with @pytest.mark.skip
- Add @pytest.mark.unit to 23 unit tests across 6 test files
- Add @pytest.mark.integration to 23 integration tests across 4 test files
- Update ci-pr-checks.yml to run only unit tests (~5-10 min)
- Update ci-nightly-science.yml to run integration tests (~4-6 hours)
- Create docs/test_categorization.md with CI/CD workflow guide
- Update docs/test_infrastructure.md with current state and next steps
- Add cross-references between test documentation files
- Add test_categorization.md to mkdocs navigation

Test breakdown: 23 unit tests, 23 integration tests, 9 placeholder tests
CI/CD impact: Fast PR checks (unit only), comprehensive nightly validation
…tegration

- Run ruff format on 8 placeholder test files
- Change grid tests from @pytest.mark.unit to @pytest.mark.integration
  (they run real simulations, not mocked tests)
Unit tests alone (10 tests) achieve ~18-20% coverage, which is expected
since they focus on fast feedback with mocked physics. Full coverage
(69%) is validated by nightly integration tests.
…ontainer

- Generate diff file from git diff in workspace before running diff-cover
- Pass --diff-file to diff-cover instead of --compare-branch
- Avoids credential/network issues when running diff-cover in container
- Uses git fetch with shallow depth for base ref before generating diff
- Should resolve persistent diff-cover failures on protected branches
- Test PROTEUS initialization with dummy.toml (all dummy physics modules)
- Validates config loading, object instantiation, directory setup
- Fast execution (~0.3s locally) suitable for CI smoke test job
- Marked with @pytest.mark.smoke for integration test suite
@timlichtenberg timlichtenberg requested review from a team and nichollsh February 1, 2026 12:55
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6293a7cc62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Fix undefined config variable bug in ci-nightly.yml fallback path
  (initialize config=None, add guard before download_melting_curves)
- Add clarifying comments about coverage-integration-only.json naming
- Add TODO in MEMORY.md for potential coverage math issue with line refs
Matches pyproject.toml and ci-nightly.yml fallback value.
Fixes potential issue where valid PRs could fail if toml parsing fails.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

- ci-pr-checks.yml: grep -A2 → -A5 for [tool.proteus.coverage_fast]
- ci-nightly.yml: grep -A2 → -A6 for [tool.coverage.report]

fail_under is 4-5 lines after section headers due to comments.
@timlichtenberg
Copy link
Member Author

@nichollsh and @stuitje the PR is ready for human review.

Copy link
Contributor

@stuitje stuitje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on ci-nightly, ci-pr-checks and docker-build so far.
Overall I think it looks very good and is a great addition to the ecosystem.
On Friday I am travelling back and can add more comments, hope that works for you.

Copy link
Member

@nichollsh nichollsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Thanks for all your efforts on this issue, @timlichtenberg. I agree that it's important to get these improvements incorporated sooner rather than later.

Reviewing this PR was tricky because the changes are so widespread and substantial. I believe that I generally understand the purpose of each test and the new infrastructure, now. The documentation you added was particularly helpful for this.

Questions from code review

Please see my comments/suggestions below.

Real world validation

Deprecated pytest plugin

I also followed the updated install steps on an Ubuntu machine. When running the pip install step, I got the following warning. However, we might want to resolve this in a separate issue, since it's non-breaking at the moment.

DEPRECATION: Building 'pytest-dependency' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. 

A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'pytest-dependency'. 

Discussion can be found at https://github.com/pypa/pip/issues/6334

Running minimal.toml

This seems to work fine!

Running unit tests locally

I encountered some errors due to the pytest markers. See specific comments about resolving these below.

Running smoke tests locally

These seemed to work well! Some of them are skipped because PROTEUS_CI_NIGHTLY is not set on my machine. Might be worthwhile adding information in the docs about how to run the full tests (i.e. tell the user to run the commands with PROTEUS_CI_NIGHTLY=1 prefixed to them).

@timlichtenberg
Copy link
Member Author

Thanks @stuitje and @nichollsh for the suggestions, I will address these asap!

…orkflow comments, fix vulcan CSV test format

- Upgrade actions/upload-artifact@v4 → @v6 across 5 workflow files (8 instances)
- Add concurrency block to ci-nightly.yml to prevent overlapping runs
- Add lowercase image_name step in docker-build.yml
- Add explanatory comments: root-user (ci-pr-checks), heredoc syntax (ci-pr-checks),
  editable installs (Dockerfile), dual-trigger timing (ci-nightly)
- Fix vulcan CSV test data to match real VULCAN output format (tab-delimited,
  species as columns, atmospheric levels as rows)
- Add smoke test types clarification in docs/test_categorization.md
- Add stale nightly baseline explanation in docs/test_infrastructure.md
Reverts the temporary push triggers for tl/test_ecosystem_v5 branch
in docker-build.yml and ci-nightly.yml, and restores :latest image tag.
All workflows verified on CI: Docker build, nightly, and PR checks pass.
- Add get_oarr_from_parr test alongside backwards-compatible wrapper test
- Register janus and timeout markers in conftest.py
- Add pytest-timeout to develop dependencies
- Widen physical bounds: pressure 100kbar→1Mbar, temperature [100,5000]→[50,10000]K, flux ±10kW→±1MW
- Rename no_runaway → no_unbounded_growth to avoid runaway greenhouse confusion
- Remove non-negative flux assertion (negative F_atm/F_int physically valid)
- Add esc_rate assertion against configured dummy rate
- Fix observe test CSV format to match real Platon output
- Fix T_magma 4000→1600K and F_atm comment for modern Earth test
- Add clarifying comments: fO2 units/keys, mock behavior, Hot Jupiter scenario
@timlichtenberg
Copy link
Member Author

timlichtenberg commented Feb 6, 2026

Thanks for the thorough review @stuitje and @nichollsh! I have addressed all from my side seemingly actionable comments with the following changes:

  • Physical bounds widened: pressure 100 kbar → 1 Mbar, temperature 100–5000 K → 50–10000 K, flux ±10 kW → ±1 MW
  • Terminology: no_runawayno_unbounded_growth across test suite
  • Removed non-negative flux assertion (negative F_atm/F_int physically valid)
  • Fixed observe test CSV to match real Platon output (tab-delimited)
  • Fixed T_magma 4000 → 1600 K and F_atm comment for modern Earth
  • Added get_oarr_from_parr test + esc_rate assertion against config
  • Clarifying comments on fO2 units/keys, mock behavior, Hot Jupiter scenario
  • Registered janus and timeout markers in conftest.py
  • Added pytest-timeout to develop dependencies

Suggest to defer to follow-up work:

If possible double check these roughly so that I can merge. There are a number of issues that need to be addressed as follow-up, but I can only start working on these once these are merged (twice, because of the fast PR - docker logic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docs Update documentation webpage Enhancement A new feature or request Priority 2: high Priority level 2: high time criticality or importance Tests Automated tests across the PROTEUS ecosystem

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Create test summary for PRs

3 participants