test(orchestrator): expand workflow IR conformance harness#1135
Conversation
Merge readiness rationale for PR #1135I re-reviewed this PR against the #961 AgentOS SSOT direction and the locked #956 Workflow IR v1 boundary after the Wave 1 N2 reviewer notes. What this PR doesPR #1135 adds an offline deterministic conformance harness under
This is test/conformance coverage only. It does not change production schema, dispatch behavior, plugin permissions, evidence policy, HITL authority, projection schema, or default runtime behavior. Alignment with #961 / AgentOS directionThis matches the #961 “thin skill, fat harness” direction because it makes the harness boundary more explicit and measurable without moving authority into a skill, plugin, projection read model, or third-party workflow framework. It also matches
I do not see over-engineering here: the added abstraction is limited to test fixture builders and a small Review-note follow-up appliedI pushed
The remaining earlier notes were low-severity / non-blocking (one redundant parametrized-code observation and one trivial helper observation) and do not affect correctness or merge safety. VerificationLocal verification in the PR worktree:
GitHub checks on the pushed head
VerdictAPPROVE / merge-ready. The PR is a bounded, deterministic test/conformance slice that strengthens the #956 Workflow IR boundary while respecting #961 sequencing and all AgentOS anti-actions. It should be safe to merge after normal maintainer review. |
PR review: Workflow IR conformance harness (#1135)VerdictAPPROVE — merge-ready. I performed a focused review of the PR scope, boundary alignment, fixture semantics, and verification status. I found no blocking, high, or medium issues remaining after Scope reviewedChanged files reviewed:
Design references checked:
FindingsNo unresolved merge-blocking findings. The previous review notes have been handled or remain explicitly non-blocking:
Correctness / boundary analysis
Over-engineering assessmentThis is not over-engineered for the stated Wave 1 N2 goal. The fixture builder module is justified because it avoids duplicating verbose VerificationLocal verification completed in GitHub checks on head
Final review verdictAPPROVE. PR #1135 is a narrowly scoped, deterministic conformance-hardening PR that aligns with #961 and #956. The reviewer nits that were worth addressing have been fixed, CI is green, and the remaining observations are not merge blockers. |
|
@ouroboros-agent re-review requested for current head Please re-run the design-note / roadmap-gate review on this PR's latest commit and report any blocking findings. This is a repository-wide refresh request across currently open PRs. |
|
@ouroboros-agent re-review requested for current head |
|
Re review ping |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Branch: feat/wave1-n2-conformance-harness | 6 files, +1230/-0 | CI: green
Scope: diff-only / architecture-level
HEAD checked: 5e2743a5a1a3b51a72cd29340ed09899d9a090df
What Improved
- The PR stays tightly inside the #956 N2 slot from #1131 by adding a deterministic conformance harness under
tests/conformance/workflow_ir/without changing runtime codepaths, schema versions, or dispatch behavior. - The positive suite now pins both graph validity and lifecycle conformance, including distinct blocked / failed / cancelled / timed_out terminal encodings, which strengthens the future #946 read-model boundary without pulling projection authority into Workflow IR.
- The plugin firewall contract remains read-only and correctly treats blocked permission denial as a non-success path, which is the key #939 / #956 boundary this harness needs.
Issue #1131 Requirements
| Requirement | Status |
|---|---|
| Offline deterministic invalid-IR rejection fixtures for dangling edge, duplicate node id, unreachable terminal, missing schema ref | MET — tests/conformance/workflow_ir/test_negative_fixtures.py:44, :64, :79, :94, :149 |
| Legal node-state transitions fixture | MET — tests/conformance/workflow_ir/test_positive_fixtures.py:119 |
| Terminal-state-emitted-once fixture | MET — tests/conformance/workflow_ir/test_positive_fixtures.py:96, :231 |
| Blocked / failed / cancelled / timed_out distinction | MET — tests/conformance/workflow_ir/test_positive_fixtures.py:149, :167, :181, :190, :205 |
| Plugin firewall contract: blocked permission cannot present as success node completion | MET — tests/conformance/workflow_ir/test_plugin_firewall_contract.py:132, :159, :242 |
| Remain within Workflow IR v1 read-only conformance boundary | MET — docs/agentos/workflow-ir-v1.md:37-44, :46-58; exercised by tests only in tests/conformance/workflow_ir/ |
| 5 negative + 5 positive fixtures green without credentials or network | MET — negative coverage pinned in tests/conformance/workflow_ir/test_negative_fixtures.py:149; positive fixture set asserted via tests/conformance/workflow_ir/test_positive_fixtures.py:54, :73, :91 |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Re-review requested for latest empty-retrigger head | MAINTAINED — current HEAD 5e2743a5a1a3b51a72cd29340ed09899d9a090df re-checked; no new blocker introduced |
| Workflow IR must stay a read-only planning/conformance substrate, not a runtime control plane | MAINTAINED — changed files remain test-only and align with docs/agentos/workflow-ir-v1.md:37-58 |
| Plugin blocked path must not be representable as successful completion | MAINTAINED — tests/conformance/workflow_ir/test_plugin_firewall_contract.py:144-156, :202-239 |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| - | — | — | — | No current-HEAD blockers found. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| - | — | — | — | No follow-up required for merge readiness. |
Test Coverage
- Adequate for the scoped N2 slot: the suite explicitly covers the required negative graph-shape failures, positive lifecycle acceptance, terminal cardinality, distinct terminal outcome encodings, and the plugin blocked-path contract.
- CI is green on the current head (
Ruff Lint,MyPy Type Check,Bridge TypeScript,Test Python 3.12/3.13/3.14,enforce-envelope,enforce-boundary).
Merge Recommendation
- Ready to merge.
ouroboros-agent[bot]
|
@ouroboros-agent re-review requested for current head |
|
@ouroboros-agent re-review requested for current head |
Merge readiness rationale for PR #1135 (current head
|
PR review: Workflow IR conformance harness (#1135)Verdict: APPROVE — merge-ready. Scope reviewedReviewed current head
Changed files reviewed:
Evidence
Test evidence
Residual riskThis PR intentionally pins conformance behavior; it does not implement production read-model projection. That is the right boundary for this Wave 1 N2/#956 scope and avoids pulling #946 projection work into this PR. No blocking issues remain. |
Add offline-deterministic conformance fixtures under tests/conformance/workflow_ir/ that lock the Q00#956 Workflow IR v1 boundary without changing schema, validator logic, or production behavior. * 5 negative fixtures (each rejected with the locked error code): dangling edge, duplicate node id, unreachable terminal, missing schema ref, illegal transition (self-loop). * 5 positive fixtures: legal node-state transitions, terminal-state- emitted-once, blocked / failed / cancelled / timed_out lifecycle distinction. * Plugin firewall contract fixture: a blocked-permission invocation cannot be projected as a successful workflow node completion. All fixtures are deterministic, offline, no network, no credentials. Refs Q00#1131, Q00#956
5ea5a58 to
ff43621
Compare
Summary
Wave 1 N-2 — adds an offline, deterministic conformance harness under
tests/conformance/workflow_ir/that locks the #956 Workflow IR v1boundary without changing schema, validator logic, or production
behavior.
validate_workflowwith thelocked Agent OS: introduce typed Workflow IR for fat-harness execution planning #956 error code, with assertions that the failing identifier is
named in the message):
missing_evidence_schema+missing_input_schema)validate_workflowANDvalidate_workflow_lifecycle_conformance):using distinct
(event_type, reason_code)encodings so a future[Feature] Define Run/Step/Artifact projections as the canonical harness vocabulary #946 projection can split the read model without a new event family)
invocation through
invoke_plugincannot be projected as a successfulworkflow node completion — the firewall never returns
status="success"on the blocked path, the only emitted event is
plugin.failed(
result.status="blocked"), and the canonical lifecycle projectionfor a blocked outcome is
NODE_FAILED+RUN_FAILED(notNODE_COMPLETED).All fixtures are deterministic, offline, with no network, no model
provider, no credentials, no subprocess, and no plugin dispatch. Timestamps
anchor to a fixed UTC epoch so replays are reproducible.
Scope guards
schema_versionbump.rules.
Test plan
python -m pytest tests/conformance/workflow_ir/ -q— 43 passedpython -m pytest tests/unit/orchestrator/ -q— 1615 passed, 1 skipped (baseline preserved)ruff check tests/conformance/— cleanruff format --check tests/conformance/— cleanmypy tests/conformance/— no issuesRefs #1131, #956.