Comprehensive CI/CD infrastructure overhaul and testing documentation framework for PROTEUS ecosystem#600
Comprehensive CI/CD infrastructure overhaul and testing documentation framework for PROTEUS ecosystem#600timlichtenberg wants to merge 303 commits intomainfrom
Conversation
…ML code block - Convert multiline YAML block scalars to single-line format for cache keys (4 occurrences) - Fix unclosed TOML code block in test_infrastructure.md - Addresses review #579 (review)
Major Changes: - Add Dockerfile with pre-compiled physics modules (SOCRATES, PETSc, SPIDER, AGNI) - Create docker-build.yml workflow (nightly builds at 02:00 UTC) - Create ci-pr-checks.yml workflow (fast PR validation ~10-15 min) - Create ci-nightly-science.yml workflow (deep science validation) - Add 'smoke' pytest marker for quick binary validation - Add comprehensive documentation and example tests Architecture Benefits: - 50+ minute time savings per PR (Python changes) - Smart rebuild: only recompile changed files - Pre-built Docker image reused across all CI workflows - Test stratification: unit → smoke → integration → slow - Nightly comprehensive validation ensures scientific correctness Test Markers: - @pytest.mark.unit: Fast tests with mocked physics (PR checks) - @pytest.mark.smoke: Quick binary validation (PR checks) - @pytest.mark.integration: Multi-module tests (nightly) - @pytest.mark.slow: Full scientific validation (nightly)
- Install Julia 1.11 specifically (required by AGNI Project.toml) - Configure git to use HTTPS instead of SSH (avoid SSH dependency) - Remove PETSc and SPIDER compilation (not needed for tests) - Add test_docker_image.sh for local validation - Image builds successfully: 3.05GB, all modules working
This allows manual testing of Docker CI/CD workflows before merging: - docker-build.yml: Build and push image from feature branch - ci-pr-checks.yml: Test PR checks with the built image Will be reverted before merge to main.
…will restore to 69% before merge)
- Delete tests/examples/test_marker_usage.py (13 example tests with 0% coverage) - Mark 9 placeholder tests with @pytest.mark.skip - Add @pytest.mark.unit to 23 unit tests across 6 test files - Add @pytest.mark.integration to 23 integration tests across 4 test files - Update ci-pr-checks.yml to run only unit tests (~5-10 min) - Update ci-nightly-science.yml to run integration tests (~4-6 hours) - Create docs/test_categorization.md with CI/CD workflow guide - Update docs/test_infrastructure.md with current state and next steps - Add cross-references between test documentation files - Add test_categorization.md to mkdocs navigation Test breakdown: 23 unit tests, 23 integration tests, 9 placeholder tests CI/CD impact: Fast PR checks (unit only), comprehensive nightly validation
…ization and infrastructure
…tegration - Run ruff format on 8 placeholder test files - Change grid tests from @pytest.mark.unit to @pytest.mark.integration (they run real simulations, not mocked tests)
Unit tests alone (10 tests) achieve ~18-20% coverage, which is expected since they focus on fast feedback with mocked physics. Full coverage (69%) is validated by nightly integration tests.
…ontainer - Generate diff file from git diff in workspace before running diff-cover - Pass --diff-file to diff-cover instead of --compare-branch - Avoids credential/network issues when running diff-cover in container - Uses git fetch with shallow depth for base ref before generating diff - Should resolve persistent diff-cover failures on protected branches
- Test PROTEUS initialization with dummy.toml (all dummy physics modules) - Validates config loading, object instantiation, directory setup - Fast execution (~0.3s locally) suitable for CI smoke test job - Marked with @pytest.mark.smoke for integration test suite
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6293a7cc62
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Fix undefined config variable bug in ci-nightly.yml fallback path (initialize config=None, add guard before download_melting_curves) - Add clarifying comments about coverage-integration-only.json naming - Add TODO in MEMORY.md for potential coverage math issue with line refs
Matches pyproject.toml and ci-nightly.yml fallback value. Fixes potential issue where valid PRs could fail if toml parsing fails.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
- ci-pr-checks.yml: grep -A2 → -A5 for [tool.proteus.coverage_fast] - ci-nightly.yml: grep -A2 → -A6 for [tool.coverage.report] fail_under is 4-5 lines after section headers due to comments.
|
@nichollsh and @stuitje the PR is ready for human review. |
stuitje
left a comment
There was a problem hiding this comment.
I commented on ci-nightly, ci-pr-checks and docker-build so far.
Overall I think it looks very good and is a great addition to the ecosystem.
On Friday I am travelling back and can add more comments, hope that works for you.
There was a problem hiding this comment.
Summary
Thanks for all your efforts on this issue, @timlichtenberg. I agree that it's important to get these improvements incorporated sooner rather than later.
Reviewing this PR was tricky because the changes are so widespread and substantial. I believe that I generally understand the purpose of each test and the new infrastructure, now. The documentation you added was particularly helpful for this.
Questions from code review
Please see my comments/suggestions below.
Real world validation
Deprecated pytest plugin
I also followed the updated install steps on an Ubuntu machine. When running the pip install step, I got the following warning. However, we might want to resolve this in a separate issue, since it's non-breaking at the moment.
DEPRECATION: Building 'pytest-dependency' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change.
A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'pytest-dependency'.
Discussion can be found at https://github.com/pypa/pip/issues/6334
Running minimal.toml
This seems to work fine!
Running unit tests locally
I encountered some errors due to the pytest markers. See specific comments about resolving these below.
Running smoke tests locally
These seemed to work well! Some of them are skipped because PROTEUS_CI_NIGHTLY is not set on my machine. Might be worthwhile adding information in the docs about how to run the full tests (i.e. tell the user to run the commands with PROTEUS_CI_NIGHTLY=1 prefixed to them).
|
Thanks @stuitje and @nichollsh for the suggestions, I will address these asap! |
…orkflow comments, fix vulcan CSV test format - Upgrade actions/upload-artifact@v4 → @v6 across 5 workflow files (8 instances) - Add concurrency block to ci-nightly.yml to prevent overlapping runs - Add lowercase image_name step in docker-build.yml - Add explanatory comments: root-user (ci-pr-checks), heredoc syntax (ci-pr-checks), editable installs (Dockerfile), dual-trigger timing (ci-nightly) - Fix vulcan CSV test data to match real VULCAN output format (tab-delimited, species as columns, atmospheric levels as rows) - Add smoke test types clarification in docs/test_categorization.md - Add stale nightly baseline explanation in docs/test_infrastructure.md
…ly CI Will revert before merge.
…until merged) Will revert before merge.
Reverts the temporary push triggers for tl/test_ecosystem_v5 branch in docker-build.yml and ci-nightly.yml, and restores :latest image tag. All workflows verified on CI: Docker build, nightly, and PR checks pass.
- Add get_oarr_from_parr test alongside backwards-compatible wrapper test - Register janus and timeout markers in conftest.py - Add pytest-timeout to develop dependencies - Widen physical bounds: pressure 100kbar→1Mbar, temperature [100,5000]→[50,10000]K, flux ±10kW→±1MW - Rename no_runaway → no_unbounded_growth to avoid runaway greenhouse confusion - Remove non-negative flux assertion (negative F_atm/F_int physically valid) - Add esc_rate assertion against configured dummy rate - Fix observe test CSV format to match real Platon output - Fix T_magma 4000→1600K and F_atm comment for modern Earth test - Add clarifying comments: fO2 units/keys, mock behavior, Hot Jupiter scenario
|
Thanks for the thorough review @stuitje and @nichollsh! I have addressed all from my side seemingly actionable comments with the following changes:
Suggest to defer to follow-up work:
If possible double check these roughly so that I can merge. There are a number of issues that need to be addressed as follow-up, but I can only start working on these once these are merged (twice, because of the fast PR - docker logic). |
Description
This PR introduces a comprehensive CI/CD infrastructure overhaul and testing documentation framework for PROTEUS. The changes establish a robust, Docker-based continuous integration pipeline with automated coverage ratcheting, dual-threshold validation, and extensive documentation for AI-assisted development workflows.
Key additions:
unit,smoke,integration,slowAGENTS.mdandMEMORY.mdCloses #481
Notes:
This update is extremely comprehensive; many new additions and structural changes to the CI. I have tried a lot to verify the changes and re-reviewed and tested. However, because many of the changes work intimately with the CI workflows and work different on
maincompared to branches, it could very well be that it will take a few iterations until everything works properly. After some reviews we will have to merge this and simply try out if everything works in the upcoming. I will aim to timely provide new updates to fix problems. I also ruff formatted the whole codebase, which led to many incremental or formatting changes across the code. My suggestion would be to only review in detail the main files, see below.Needs double merge:
Because the way
ci-pr-checks.ymlinteracts withdocker-build.ymlthe changes cannot be completed with a single merge. Instead, I will have to merge this PR first (this uses the Docker image from the branchtl/test_ecosystem_v5. Then once this is merged, I will have to updateci-pr-checks.ymlto from then on refer to the docker image built on themainbranch. This can only be generated oncedocker-build.ymlcan be triggered onmain. Once that is done it should work continuously and automatically over night. So, one big PR (this one), followed by a smaller to just update the docker tag inci-pr-checks.yml.Files for Human Review
This PR contains 204 changed files. Below is a prioritized guide for reviewers.
🔴 Critical Infrastructure (Must Review)
.pre-commit-config.yaml🟠 New Documentation (Recommended Review)
🟡 Key Source Changes (Spot Check)
src/proteus/config/src/proteus/atmos_clim/src/proteus/interior/src/proteus/escape/🟢 Test Suite (Trust CI, Spot Check)
tests/src/tests/integration/tests/conftest.py⚪ Low Priority (Auto-Generated / Cosmetic)
examples/- Updated example outputssrc/proteus/plot/- Minor fixestools/- Helper scripts for coverage analysisDeleted Files
.github/workflows/tests.yamlReviewer Focus: Start with 🔴 Critical Infrastructure, then skim 🟠 Documentation. Trust CI for test correctness.
New Documentation
unit,smoke,integration,slow)CI Workflow Architecture
ci-pr-checks.ymlvsci-nightly.ymlmain/dev, pushesunit+smokeonlyunit→smoke→integration→slowproteus_test_quality_gate.yml(Reusable Workflow)A standardized quality gate for PROTEUS ecosystem modules (CALLIOPE, JANUS, MORS, etc.). Provides:
Ecosystem modules can call this workflow to adopt PROTEUS testing standards without duplicating CI configuration.
AI Context Files
AGENTS.mdMEMORY.mdThese files provide AI tools (Windsurf Cascade, GitHub Copilot, etc.) with project-specific context for more accurate code generation and test writing.
Validation of changes
Automated Tests Performed
ruff check src/ tests/andruff format --check src/ tests/passbash tools/validate_test_structure.shconfirms all test directories mirror source structureCI Workflow Validation
main:latesttagactions/checkout@v4,actions/setup-python@v5)ci-nightly.yml)Test Configuration
coverage+pytest-covChecklist
Relevant people
@FormingWorlds/proteus-maintainer @stuitje @FormingWorlds/proteus-developer
Anyone from the dev team is welcome to review this PR; it will change substantially how we interact with code development.
Summary of Changes by Category
CI/CD Infrastructure
.github/workflows/ci-pr-checks.yml- Fast PR feedback with unit+smoke tests.github/workflows/ci-nightly.yml- Comprehensive nightly validation.github/workflows/docker-build.yml- Nightly Docker image builds (02:00 UTC).github/workflows/proteus_test_quality_gate.yml- Reusable workflow for ecosystem.github/workflows/code-style.yaml- Pre-commit hooks on PRs.github/workflows/publish.yaml- PyPI publishingDocumentation
docs/test_infrastructure.md- Testing system overviewdocs/test_building.md- Test writing guidedocs/test_categorization.md- Test markers and CI mappingdocs/ai_usage.md- AI assistant integrationdocs/docker_ci_architecture.md- Docker CI deep diveAI Context
AGENTS.md- AI assistant instructions (IDE)MEMORY.md- Project context and decisionsTools
tools/validate_test_structure.sh- Test directory validationtools/update_coverage_threshold.py- Coverage ratcheting scripttools/coverage_analysis.sh- Coverage gap analysistools/check_file_sizes.sh- File size enforcementConfiguration
pyproject.toml- Coverage thresholds, pytest markers, ruff rules.pre-commit-config.yaml- Pre-commit hooks including file size limitsNote
High Risk
High risk because it replaces core CI/CD workflows (Docker image build, PR checks, nightly runs) and introduces automated commits/coverage gating; failures here can block merges and affect release automation.
Overview
CI/CD is re-architected around a pre-built Docker image and split workflows. Adds
ci-pr-checks.yml(fast unit+smoke + diff-cover) andci-nightly.yml(scheduled full-suite validation) plusdocker-build.ymlto build/pushghcr.ioimages and trigger nightlies; removes the legacytests.yamlpipeline.Introduces coverage governance and enforcement. Adds dual coverage thresholds (
[tool.proteus.coverage_fast]and[tool.coverage.report]), CI logic to prevent threshold decreases vsmain, nightly artifacts used to estimate total coverage on PRs, and automatic threshold “ratcheting” commits viatools/update_coverage_threshold.py.Adds contributor/agent guidance and new documentation. Introduces
AGENTS.md,MEMORY.md, multiple new testing/CI docs indocs/, a coverage-improvement issue template, and a pre-commit hook to enforce size limits on the new context files.Misc config/runtime adjustments. Adds a new
Dockerfile, updates action versions in existing workflows, tweaksinput/all_options.tomldefaults, expands.gitignore, and applies small ruff/formatting fixes insrc/proteus/atmos_chem/*.Written by Cursor Bugbot for commit 05703b4. This will update automatically on new commits. Configure here.