Skip to content

test(orchestrator): expand workflow IR conformance harness#1135

Merged
shaun0927 merged 3 commits into
Q00:mainfrom
shaun0927:feat/wave1-n2-conformance-harness
May 20, 2026
Merged

test(orchestrator): expand workflow IR conformance harness#1135
shaun0927 merged 3 commits into
Q00:mainfrom
shaun0927:feat/wave1-n2-conformance-harness

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

Wave 1 N-2 — adds an offline, deterministic conformance harness under
tests/conformance/workflow_ir/ that locks the #956 Workflow IR v1
boundary without changing schema, validator logic, or production
behavior.

  • 5 negative fixtures (each rejected by validate_workflow with the
    locked Agent OS: introduce typed Workflow IR for fat-harness execution planning #956 error code, with assertions that the failing identifier is
    named in the message):
    • dangling edge
    • duplicate node id
    • unreachable terminal
    • missing schema ref (missing_evidence_schema + missing_input_schema)
    • illegal transition (self-loop)
  • 5 positive fixtures (each accepted by validate_workflow AND
    validate_workflow_lifecycle_conformance):
  • Plugin firewall contract fixture: proves that a blocked-permission
    invocation through invoke_plugin cannot be projected as a successful
    workflow node completion — the firewall never returns status="success"
    on the blocked path, the only emitted event is plugin.failed
    (result.status="blocked"), and the canonical lifecycle projection
    for a blocked outcome is NODE_FAILED + RUN_FAILED (not
    NODE_COMPLETED).

All fixtures are deterministic, offline, with no network, no model
provider, no credentials, no subprocess, and no plugin dispatch. Timestamps
anchor to a fixed UTC epoch so replays are reproducible.

Scope guards

Test plan

  • python -m pytest tests/conformance/workflow_ir/ -q — 43 passed
  • python -m pytest tests/unit/orchestrator/ -q — 1615 passed, 1 skipped (baseline preserved)
  • ruff check tests/conformance/ — clean
  • ruff format --check tests/conformance/ — clean
  • mypy tests/conformance/ — no issues

Refs #1131, #956.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Merge readiness rationale for PR #1135

I re-reviewed this PR against the #961 AgentOS SSOT direction and the locked #956 Workflow IR v1 boundary after the Wave 1 N2 reviewer notes.

What this PR does

PR #1135 adds an offline deterministic conformance harness under tests/conformance/workflow_ir/ for the existing Workflow IR substrate:

  • five negative fixtures that prove validate_workflow rejects the locked Agent OS: introduce typed Workflow IR for fat-harness execution planning #956 graph-shape failures with stable error codes;
  • positive lifecycle fixtures that prove valid WorkflowSpec + lifecycle histories pass validate_workflow_lifecycle_conformance;
  • blocked / failed / cancelled / timed-out terminal-outcome fixtures with distinct (event_type, reason_code) encodings;
  • a narrow plugin-firewall contract fixture proving a blocked plugin permission path cannot be treated as a successful workflow-node completion.

This is test/conformance coverage only. It does not change production schema, dispatch behavior, plugin permissions, evidence policy, HITL authority, projection schema, or default runtime behavior.

Alignment with #961 / AgentOS direction

This matches the #961 “thin skill, fat harness” direction because it makes the harness boundary more explicit and measurable without moving authority into a skill, plugin, projection read model, or third-party workflow framework. It also matches docs/agentos/workflow-ir-v1.md:

I do not see over-engineering here: the added abstraction is limited to test fixture builders and a small LifecycleFixture dataclass so that the conformance cases are readable and reusable. The PR intentionally avoids new runtime abstractions.

Review-note follow-up applied

I pushed c7169b50 (test(orchestrator): address conformance review nits) to address the Wave 1 N2 reviewer notes:

  • removed the fragile module-level assert AGENT_INPUT_SCHEMA and AGENT_EVIDENCE_SCHEMA pattern from test_plugin_firewall_contract.py;
  • documented why positive_terminal_emitted_once intentionally omits NODE_SCHEDULED events while positive_legal_transitions covers the full scheduled → started → completed sequence.

The remaining earlier notes were low-severity / non-blocking (one redundant parametrized-code observation and one trivial helper observation) and do not affect correctness or merge safety.

Verification

Local verification in the PR worktree:

  • uv run pytest tests/conformance/workflow_ir -q43 passed
  • uv run ruff check tests/conformance/workflow_ir → passed
  • uv run ruff check . && uv run pytest -q9588 passed, 3 skipped
  • git diff --check → passed

GitHub checks on the pushed head c7169b50b92bf6d6348a60787f821945e3aa38ac are all green:

  • Ruff Lint ✅
  • MyPy Type Check ✅
  • Bridge TypeScript ✅
  • Test Python 3.12 ✅
  • Test Python 3.13 ✅
  • Test Python 3.14 ✅
  • enforce-envelope ✅
  • enforce-boundary ✅

Verdict

APPROVE / merge-ready.

The PR is a bounded, deterministic test/conformance slice that strengthens the #956 Workflow IR boundary while respecting #961 sequencing and all AgentOS anti-actions. It should be safe to merge after normal maintainer review.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

PR review: Workflow IR conformance harness (#1135)

Verdict

APPROVE — merge-ready.

I performed a focused review of the PR scope, boundary alignment, fixture semantics, and verification status. I found no blocking, high, or medium issues remaining after c7169b50.

Scope reviewed

Changed files reviewed:

  • tests/conformance/__init__.py
  • tests/conformance/workflow_ir/__init__.py
  • tests/conformance/workflow_ir/fixtures.py
  • tests/conformance/workflow_ir/test_negative_fixtures.py
  • tests/conformance/workflow_ir/test_positive_fixtures.py
  • tests/conformance/workflow_ir/test_plugin_firewall_contract.py

Design references checked:

Findings

No unresolved merge-blocking findings.

The previous review notes have been handled or remain explicitly non-blocking:

  1. Module-level assert in plugin firewall test — fixed in c7169b50 by removing the assert and unused imports rather than relying on an assertion that could be stripped by optimized Python execution.
  2. positive_terminal_emitted_once scheduling omission — fixed in c7169b50 by documenting that this fixture is intentionally minimal for terminal cardinality, while positive_legal_transitions owns the full scheduled → started → completed coverage.
  3. Negative missing-schema parametrization checks only one code — non-blocking because the dedicated TestMissingSchemaRefFixture.test_emits_missing_schema_codes asserts both missing_evidence_schema and missing_input_schema for the same tampered node.
  4. Trivial _fixture_id helper — style-only, harmless, and keeps pytest IDs centralized if fixture ID generation later grows.

Correctness / boundary analysis

Over-engineering assessment

This is not over-engineered for the stated Wave 1 N2 goal. The fixture builder module is justified because it avoids duplicating verbose WorkflowSpec / lifecycle construction across positive, negative, and plugin-boundary tests. The abstraction is test-local and does not leak into runtime code. The PR improves conformance observability without creating a new product surface or control plane.

Verification

Local verification completed in /opt/data/projects/ouroboros-pr1135-review:

uv run pytest tests/conformance/workflow_ir -q
# 43 passed

uv run ruff check tests/conformance/workflow_ir
# All checks passed

uv run ruff check . && uv run pytest -q
# 9588 passed, 3 skipped

git diff --check
# passed

GitHub checks on head c7169b50b92bf6d6348a60787f821945e3aa38ac are all successful:

  • Ruff Lint ✅
  • MyPy Type Check ✅
  • Bridge TypeScript ✅
  • Test Python 3.12 ✅
  • Test Python 3.13 ✅
  • Test Python 3.14 ✅
  • enforce-envelope ✅
  • enforce-boundary ✅

Final review verdict

APPROVE.

PR #1135 is a narrowly scoped, deterministic conformance-hardening PR that aligns with #961 and #956. The reviewer nits that were worth addressing have been fixed, CI is green, and the remaining observations are not merge blockers.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

shaun0927 commented May 19, 2026

@ouroboros-agent re-review requested for current head c7169b50b92bf6d6348a60787f821945e3aa38ac.

Please re-run the design-note / roadmap-gate review on this PR's latest commit and report any blocking findings. This is a repository-wide refresh request across currently open PRs.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent re-review requested for current head 5e2743a5a1a3b51a72cd29340ed09899d9a090df after the empty retrigger commit. Please re-run the AgentOS design-note / roadmap-gate review for PR #1135 and report an APPROVE or any blocking findings before merge.

@Q00
Copy link
Copy Markdown
Owner

Q00 commented May 19, 2026

Re review ping

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Branch: feat/wave1-n2-conformance-harness | 6 files, +1230/-0 | CI: green
Scope: diff-only / architecture-level
HEAD checked: 5e2743a5a1a3b51a72cd29340ed09899d9a090df

What Improved

  • The PR stays tightly inside the #956 N2 slot from #1131 by adding a deterministic conformance harness under tests/conformance/workflow_ir/ without changing runtime codepaths, schema versions, or dispatch behavior.
  • The positive suite now pins both graph validity and lifecycle conformance, including distinct blocked / failed / cancelled / timed_out terminal encodings, which strengthens the future #946 read-model boundary without pulling projection authority into Workflow IR.
  • The plugin firewall contract remains read-only and correctly treats blocked permission denial as a non-success path, which is the key #939 / #956 boundary this harness needs.

Issue #1131 Requirements

Requirement Status
Offline deterministic invalid-IR rejection fixtures for dangling edge, duplicate node id, unreachable terminal, missing schema ref METtests/conformance/workflow_ir/test_negative_fixtures.py:44, :64, :79, :94, :149
Legal node-state transitions fixture METtests/conformance/workflow_ir/test_positive_fixtures.py:119
Terminal-state-emitted-once fixture METtests/conformance/workflow_ir/test_positive_fixtures.py:96, :231
Blocked / failed / cancelled / timed_out distinction METtests/conformance/workflow_ir/test_positive_fixtures.py:149, :167, :181, :190, :205
Plugin firewall contract: blocked permission cannot present as success node completion METtests/conformance/workflow_ir/test_plugin_firewall_contract.py:132, :159, :242
Remain within Workflow IR v1 read-only conformance boundary METdocs/agentos/workflow-ir-v1.md:37-44, :46-58; exercised by tests only in tests/conformance/workflow_ir/
5 negative + 5 positive fixtures green without credentials or network MET — negative coverage pinned in tests/conformance/workflow_ir/test_negative_fixtures.py:149; positive fixture set asserted via tests/conformance/workflow_ir/test_positive_fixtures.py:54, :73, :91

Prior Findings Status

Prior Finding Status
Re-review requested for latest empty-retrigger head MAINTAINED — current HEAD 5e2743a5a1a3b51a72cd29340ed09899d9a090df re-checked; no new blocker introduced
Workflow IR must stay a read-only planning/conformance substrate, not a runtime control plane MAINTAINED — changed files remain test-only and align with docs/agentos/workflow-ir-v1.md:37-58
Plugin blocked path must not be representable as successful completion MAINTAINEDtests/conformance/workflow_ir/test_plugin_firewall_contract.py:144-156, :202-239

Blockers

# File:Line Severity Confidence Finding
- No current-HEAD blockers found.

Follow-ups

# File:Line Priority Confidence Suggestion
- No follow-up required for merge readiness.

Test Coverage

  • Adequate for the scoped N2 slot: the suite explicitly covers the required negative graph-shape failures, positive lifecycle acceptance, terminal cardinality, distinct terminal outcome encodings, and the plugin blocked-path contract.
  • CI is green on the current head (Ruff Lint, MyPy Type Check, Bridge TypeScript, Test Python 3.12/3.13/3.14, enforce-envelope, enforce-boundary).

Merge Recommendation

  • Ready to merge.

ouroboros-agent[bot]

@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent re-review requested for current head 1d7d49896ac4412cbff8020b3122bbe10c632da8 after strengthening the blocked-plugin Workflow IR projection contract. Please re-run the AgentOS design-note / roadmap-gate review and report APPROVE or any remaining blockers before merge.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent re-review requested for current head 5ea5a58fc39788efa1ca9ae0fce6b6b34020f33f after an empty retrigger commit. Please re-run the AgentOS design-note / roadmap-gate review for PR #1135 and report APPROVE or any blocking findings before merge.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Merge readiness rationale for PR #1135 (current head 5ea5a58)

I re-reviewed the PR against the #961 AgentOS SSOT direction, the locked #956 Workflow IR v1 boundary, and the #939 plugin-firewall boundary. My conclusion is that this PR is appropriately scoped and merge-ready.

What this PR does

PR #1135 adds an offline deterministic conformance harness under tests/conformance/workflow_ir/:

  • deterministic fixture builders with fixed timestamps and schema refs (fixtures.py:1-19, fixtures.py:43-49);
  • negative Workflow IR validator fixtures for dangling edges, duplicate node ids, unreachable terminals, missing schema refs, and self-loops (test_negative_fixtures.py:44, :64, :79, :94, :116);
  • positive validation and lifecycle conformance checks (test_positive_fixtures.py:54-65, :73-82);
  • terminal outcome conformance for exactly-one terminal event and distinct blocked / failed / cancelled / timed_out encodings (test_positive_fixtures.py:96-111, :149-164, :167-178, :181-202);
  • a plugin-firewall contract proving a blocked permission-gate invocation cannot be represented as successful workflow completion (test_plugin_firewall_contract.py:137-161, :164-221, :224-273).

AgentOS / #961 alignment

This follows the #961 direction rather than drifting into over-engineering:

  • It is tests-only / conformance-only. The diff adds only tests/conformance/... files and does not change runtime dispatch, schema versions, production projection, plugin execution, or roadmap behavior.
  • It keeps Workflow IR as a stable contract surface. The tests pin graph validation and lifecycle semantics without introducing a new workflow framework or external orchestration dependency.
  • It uses a “fat harness, thin skill” shape: fixtures and contract assertions live in a deterministic harness while production code remains untouched.
  • It avoids scope creep into [Feature] Define Run/Step/Artifact projections as the canonical harness vocabulary #946 read-model work; timed-out remains encoded as RUN_FAILED + reason_code='timed_out' (test_positive_fixtures.py:190-202) instead of adding a new event family in this PR.
  • It strengthens the Agent OS plugins: standardize lifecycle hooks, permissions, and audit events #939 boundary by asserting that plugin permission denial emits plugin.failed only and projects to NODE_FAILED / RUN_FAILED, never NODE_COMPLETED / RUN_COMPLETED (test_plugin_firewall_contract.py:137-161, :254-273).

Verification

Local verification on the current worktree/head:

  • uv run pytest tests/conformance/workflow_ir -q — passed (43 passed).
  • uv run ruff check tests/conformance/workflow_ir/test_plugin_firewall_contract.py — passed.
  • uv run mypy tests/conformance/workflow_ir/test_plugin_firewall_contract.py — passed (Success: no issues found).
  • uv run pytest tests/unit/orchestrator/test_workflow_ir.py tests/unit/orchestrator/test_workflow_lifecycle.py tests/unit/plugin/test_firewall.py -q — passed.

GitHub checks for current head 5ea5a58fc39788efa1ca9ae0fce6b6b34020f33f are green: Ruff Lint, Python 3.12/3.13/3.14 tests, MyPy Type Check, Bridge TypeScript, enforce-envelope, and enforce-boundary all completed successfully.

Verdict

APPROVE — merge-ready.

I found no blocking issues and no evidence of over-engineering or incorrect architectural direction. The one improvement I made was to strengthen the blocked-plugin Workflow IR projection contract so the conformance harness explicitly proves that a blocked plugin permission result remains a failure path, not a successful node completion path.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

PR review: Workflow IR conformance harness (#1135)

Verdict: APPROVE — merge-ready.

Scope reviewed

Reviewed current head 5ea5a58fc39788efa1ca9ae0fce6b6b34020f33f against:

Changed files reviewed:

  • tests/conformance/__init__.py
  • tests/conformance/workflow_ir/__init__.py
  • tests/conformance/workflow_ir/fixtures.py
  • tests/conformance/workflow_ir/test_negative_fixtures.py
  • tests/conformance/workflow_ir/test_positive_fixtures.py
  • tests/conformance/workflow_ir/test_plugin_firewall_contract.py

Evidence

  • PR scope is tests-only: the PR adds only tests/conformance/... files and leaves production behavior unchanged.
  • The harness is deterministic/offline by construction: fixtures.py:9-13 says no network, model provider, plugin subprocess, cloud credential, or global-state mutation is touched; timestamps are fixed at fixtures.py:43.
  • Negative validator coverage is concrete and tied to locked error-code behavior: dangling edge (test_negative_fixtures.py:44-61), duplicate node id (:64-76), unreachable terminal (:79-91), missing schema refs (:94-113), and self-loop (:116-129).
  • Positive coverage verifies both spec validation and lifecycle conformance for all positive fixtures (test_positive_fixtures.py:54-65, :73-82).
  • Terminal lifecycle semantics are pinned: exactly one terminal run event (test_positive_fixtures.py:96-111) and distinct blocked / failed / cancelled / timed_out encodings (:149-202).
  • The plugin firewall contract is now explicit about the load-bearing boundary: a blocked invocation emits exactly one plugin.failed event and never plugin.invoked / plugin.completed (test_plugin_firewall_contract.py:137-161), and the local harness projection maps status='blocked' to NODE_FAILED + RUN_FAILED, not completion (:164-221, :224-273).

Test evidence

  • uv run pytest tests/conformance/workflow_ir -q → passed (43 passed).
  • uv run ruff check tests/conformance/workflow_ir/test_plugin_firewall_contract.py → passed.
  • uv run mypy tests/conformance/workflow_ir/test_plugin_firewall_contract.py → passed (Success: no issues found).
  • uv run pytest tests/unit/orchestrator/test_workflow_ir.py tests/unit/orchestrator/test_workflow_lifecycle.py tests/unit/plugin/test_firewall.py -q → passed.
  • GitHub checks on current head 5ea5a58fc39788efa1ca9ae0fce6b6b34020f33f are all successful: Ruff Lint, Python 3.12/3.13/3.14, MyPy Type Check, Bridge TypeScript, enforce-envelope, and enforce-boundary.

Residual risk

This PR intentionally pins conformance behavior; it does not implement production read-model projection. That is the right boundary for this Wave 1 N2/#956 scope and avoids pulling #946 projection work into this PR.

No blocking issues remain.

shaun0927 and others added 3 commits May 20, 2026 10:47
Add offline-deterministic conformance fixtures under
tests/conformance/workflow_ir/ that lock the Q00#956 Workflow IR v1 boundary
without changing schema, validator logic, or production behavior.

* 5 negative fixtures (each rejected with the locked error code):
  dangling edge, duplicate node id, unreachable terminal, missing
  schema ref, illegal transition (self-loop).
* 5 positive fixtures: legal node-state transitions, terminal-state-
  emitted-once, blocked / failed / cancelled / timed_out lifecycle
  distinction.
* Plugin firewall contract fixture: a blocked-permission invocation
  cannot be projected as a successful workflow node completion.

All fixtures are deterministic, offline, no network, no credentials.

Refs Q00#1131, Q00#956
@shaun0927 shaun0927 force-pushed the feat/wave1-n2-conformance-harness branch from 5ea5a58 to ff43621 Compare May 20, 2026 01:47
@shaun0927 shaun0927 merged commit 85ebbae into Q00:main May 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants