Feat/tasks mrtr extension by panyam · Pull Request #262 · modelcontextprotocol/conformance

panyam · 2026-05-05T23:01:49Z

Summary

Adds server-conformance scenarios for SEP-2663 (Tasks Extension), with incidental coverage of SEP-2575 (per-request capability override) and SEP-2243 (Mcp-Method/Mcp-Name request headers) in the parts of the surface where they bind to tasks. Plus one new MRTR-adjacent check (mrtr-tasks-composition, currently SKIPPED) for the SEP-2663 commit 451f5e1 MRTR→Tasks promotion flow. 8 ClientScenario classes / ~33 internal checks for tasks plus 1 class / 7 SUCCESS + 1 SKIPPED for the MRTR↔Tasks composition placeholder. Tagged ['extension', DRAFT_PROTOCOL_VERSION] per #255 conventions and registered in pendingClientScenariosList so default everything-server runs stay green.

Motivation and Context

SEP-2663 (Tasks Extension), SEP-2575, and SEP-2243 are in active draft and currently have no conformance coverage in this repo. SDKs implementing them - including ones already shipping reference servers - have nothing to validate against, so wire-shape regressions and edge-case behavior (cancellation semantics, requestState handling, capability gating) slip
through SDK-internal tests. This PR fills that gap. The new scenarios assert what the spec text says, not what any specific implementation does, so any SDK can run them.

The MRTR↔Tasks composition placeholder (mrtr-tasks-composition, SKIPPED) is a forward-looking marker for SEP-2663 commit 451f5e1, which made the "MRTR rounds then promote to a task" flow normative on the wire - see the open spec questions below for why it's deferred.

How Has This Been Tested?

Run end-to-end against a reference Go fixture from the in-flight panyam/mcpkit SDK:

TASKS_SERVER_URL=http://localhost:18092/mcp \
TASKS_SERVER_CMD="/path/to/tasks-fixture --serve --addr :18092" \
MRTR_SERVER_URL=http://localhost:18093/mcp \
MRTR_SERVER_CMD="/path/to/mrtr-fixture --serve --addr :18093" \
  npx vitest run src/scenarios/server/

Branch results:

Tasks: 8/8 scenarios (~33 internal checks)
MRTR: 1/1 scenario (7 SUCCESS + 1 SKIPPED - see Open Question 2 in Additional Context)

The runner is brand-neutral and language-agnostic - fixture wired via env vars, spawn via sh -c, readiness via TCP polling, no log-line scanning. Anyone's server in any language works. Reference fixtures:

npm test against the upstream everything-server continues to pass - the new scenarios live in pendingClientScenariosList so all-scenarios.test.ts skips them until everything-server grows extension support.

Breaking Changes

None. All new scenarios are additive and tagged as 'extension' + DRAFT_PROTOCOL_VERSION, so they're invisible to dated --spec-version runs and only appear under --suite extensions or --spec-version draft. Default CI runs against everything-server are unaffected (the new scenarios are filtered out via pendingClientScenariosList).

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Relationship to PR #188 (SEP-2322 MRTR)

Complementary, not overlapping. SEP-2663 builds on SEP-2322's base types, so a few of the tasks scenarios touch the MRTR shape (inputRequests, requestState, resultType) in their tasks-on-the-wire form (status:"input_required" on tasks/get, tasks/update resume path, partial inputResponses fulfillment). The standalone-ephemeral-MRTR coverage stays in #188.

The branch also contains a src/scenarios/server/mrtr/ folder with ephemeral-flow scenarios mirroring some of #188's checks. Those exist because the tasks reference fixture exercises the full MRTR base, and running them locally caught a real bug. For this upstream merge:

If Conformance Tests for SEP-2322 MRTR #188 lands first (expected order - SEP-2322 ships before SEP-2663), I'll drop the duplicative mrtr-ephemeral-flow.ts checks in favor of Conformance Tests for SEP-2322 MRTR #188's scenarios.
The mrtr-tasks-composition check (currently SKIPPED) is the genuinely new contribution this PR makes to MRTR coverage.

Scope (8 ClientScenario classes, ~33 checks)

tasks-lifecycle - sync vs task dispatch, DetailedTask shape, tool errors vs protocol errors, cancel ack, cancel-on-terminal -32602
tasks-capability-negotiation - extension advertised under capabilities.extensions; tasks/* gated behind negotiation; SEP-2575 per-request opt-in
tasks-wire-fields - ttlSeconds / pollIntervalMilliseconds renames, no early TTL expiry, no related-task _meta on inlined result
tasks-request-state - optional emission, echo acceptance, stale-but-valid tolerance (tasks-surface form)
tasks-mrtr-input - inputRequests on tasks/get, tasks/update resume, partial-fulfillment with multi-input fixture
tasks-request-headers - SEP-2243 server tolerates routing headers; body authoritative when conflicting
tasks-dispatch-and-envelope - removed v1 methods (-32601), legacy task param ignored, resultType:"complete" on every non-task response, strong-consistency immediate tasks/get, unknown taskId -32602
tasks-status-notifications - optional INFO check (notifications are MAY per spec)

Plus mrtr-ephemeral-flow (1 class / 7 SUCCESS + 1 SKIPPED) under src/scenarios/server/mrtr/.

Design highlights

Platform/Language agnostic runner. Fixture configured via env vars TASKS_SERVER_URL / TASKS_SERVER_CMD (and
MRTR_SERVER_URL / MRTR_SERVER_CMD). Spawn via sh -c, readiness via TCP polling, no log-line scanning. Suite is
describe.skip'd when env vars are unset.
Raw-fetch escape hatch. SDK's typed schemas strip the SEP-2663 wire fields (resultType, taskId, inputRequests,
requestState, inlined result/error). Helpers in src/scenarios/server/tasks/helpers.ts provide initRawSession +
rawRequest/rawRequestFull so scenarios read those fields directly. When the SDK gains schemas for SEP-2663 shapes, the
call sites switch back to client.request(..., AnyResult) and the helper shrinks. Similar in spirit to the raw-MCP
additions in Conformance Tests for SEP-2322 MRTR #188 - could converge on a shared helper.
Registered in pendingClientScenariosList - all-scenarios.test.ts skips them since everything-server doesn't
implement the extension yet. CLI lookup (getClientScenario(name)) still finds them.

Open spec questions

MRTR resultType discriminator value. SEP-2322's draft uses "input_required"; SEP-2663's draft uses "incomplete". Centralized as MRTR_INCOMPLETE_RESULT_TYPE for a one-line flip when SEP authors converge. Tracked at modelcontextprotocol/modelcontextprotocol PR 2663 comment 4381885336 and PR 2322 comment 4381884825.
mrtr-tasks-composition. SEP-2663 commit 451f5e1 made the MRTR→Tasks promotion flow normative on the wire: a single tools/call MAY exchange one or more IncompleteResult rounds and then return CreateTaskResult on a subsequent round. Implementing this requires the server middleware to defer task creation until the handler signals async-promotion - the natural alternative (mint the task up-front the moment a tool advertises task support) doesn't fit, because by the time the handler's IsIncomplete signal is observable, the CreateTaskResult is already on the wire. This is a wire-contract requirement, not an SDK-specific implementation choice; existing SDKs across languages that took the up-front pattern will need refactoring before this check can pass anywhere. Combined with Adjust test and allow running in interactive mode #1 above, that's why the check is SKIPPED today.

Closes: #261

Adds the first scenario for the SEP-2663 io.modelcontextprotocol/tasks extension — a single TasksLifecycleScenario covering sync vs async dispatch, DetailedTask shape on tasks/get, tool errors vs protocol errors, and cancellation semantics. 8 ConformanceCheck records, all passing against a SEP-2663-conformant Go fixture. Why "tasks" (not "tasks-v2"): SEP-2663 IS the tasks surface once it lands; the v2 suffix is only meaningful in implementations that maintain a v1 surface alongside, which the conformance suite does not. Layout: - src/scenarios/server/tasks/lifecycle.ts — scenario class - src/scenarios/server/tasks/helpers.ts — raw-fetch escape hatch (the SDK's typed schemas strip resultType/inputRequests/...) - src/scenarios/server/tasks/lifecycle.test.ts — fork-local vitest runner. Two modes: spawn a fixture binary via MCPKIT_TASKS_BINARY, or point at an already-running server via MCPKIT_TASKS_SERVER_URL. Skips when neither is set so it doesn't break upstream CI runs that go through everything-server (which doesn't yet implement io.modelcontextprotocol/tasks). Scenario is registered in pendingClientScenariosList so all-scenarios.test.ts skips it; promote to active once the upstream fixture grows extension support. Tagged ['extension', DRAFT_PROTOCOL_VERSION] — selectable via --suite extensions and --spec-version draft.

Builds out the rest of the tasks scenarios (atop the lifecycle canary) and adds the SEP-2322 ephemeral MRTR scenario in a sibling folder. Both target their own fixtures; both runners are brand-neutral and language-agnostic (TASKS_SERVER_URL / TASKS_SERVER_CMD, MRTR_SERVER_URL / MRTR_SERVER_CMD; readiness via TCP polling). Tasks ClientScenario classes: - TasksLifecycleScenario (8 checks; v2-01..v2-08) - TasksCapabilityNegotiationScenario (4 checks; v2-11/22/23/25, SEP-2575) - TasksWireFieldsScenario (3 checks; v2-12/13/21) - TasksRequestStateScenario (3 checks; v2-14/15/28) - TasksMRTRInputScenario (3 checks; v2-16/17/29 partial fulfillment) - TasksRequestHeadersScenario (3 checks; SEP-2243 request-header tolerance) - TasksDispatchScenario (8 checks; v2-09/10/19/20/26/27/30/31) - TasksStatusNotificationsScenario (1 check; SEP-2663 §notifications, optional) MRTR ClientScenario class: - MrtrEphemeralFlowScenario (7 checks + 1 SKIPPED; mrtr-01..07, mrtr-08 deferred for spec terminology + reference-impl reasons) Both runners spawn the fixture via a shell command and detect readiness by TCP-polling the URL's host/port — no log-line scanning, no language-specific assumptions. The same env vars work for any server implementation. Scenarios are tagged ['extension', DRAFT_PROTOCOL_VERSION] and registered in pendingClientScenariosList so all-scenarios.test.ts (which targets the upstream everything-server) skips them until the fixture grows SEP-2322 / SEP-2663 support.

Restructured around ClientScenario classes (one row per class with check-list under it) rather than per-numbered-test slugs. Documents fixture requirements, env vars, open spec questions, and the wire-format diff for each suite. Per AGENTS.md, severity follows spec keyword (MUST/MUST NOT → FAILURE, SHOULD/SHOULD NOT → WARNING). The READMEs explain why some checks emit INFO rather than FAILURE (optional emission paths per SEP-2322).

panyam · 2026-05-05T23:04:01Z

@LucaButBoring

The bulk of the v2 tasks + MRTR conformance lives in the upstream-bound fork now (panyam/mcpconformance, branch feat/tasks-mrtr-extension; upstream Draft PR modelcontextprotocol/conformance#262). Updates the README/WALKTHROUGH/walkthrough.go references in examples/tasks-v2 + examples/mrtr to point at the fork, the migration guide (docs/TASKS_V2_MIGRATION.md) likewise, and the matching Go test skip (server/mrtr_test.go) to point at the new conformance scenario path. No runtime changes.

Compress CLAUDE.md's Conformance section to a one-liner roll-up + add a Gotchas bullet for MCPCONFORMANCE_PATH (the env var the new testconf-tasks-v2 / testconf-mrtr targets shell into). The detailed fork-vs-local layout already lives in CAPABILITIES.md mcp-tasks-v2-conformance. Add a "Final disposition" footer to docs/SEP_2663_TASKS_CONFORMANCE_PLAN.md recording the graduation upstream (panyam/mcpconformance fork branch feat/tasks-mrtr-extension, Draft PR modelcontextprotocol/conformance#262) and noting the mcpkit-local folders are now vitest sentinels reserved for future mcpkit-stricter scenarios. No memory pruning — the four feedback notes are still working guidance, not duplicates of checked-in docs.

Two reviewer-driven additions: 1. SEP-2663 createdAt / lastUpdatedAt ISO-8601 assertion in `tasks-server-task-creation` (per Luca's PR modelcontextprotocol#262 review feedback). The check now flags servers that emit non-ISO timestamps (epoch seconds, RFC-2822, etc.) on TaskInfoV2 envelopes. 2. Factor cross-cutting test-harness helpers into _shared/: - `_shared/test-runner.ts` — `waitForServerReady` (renamed from `waitForTcpReady`; the call site cares about server readiness, not the TCP-poll mechanism). Imported by tasks/ and mrtr/ all-scenarios.test.ts; replaces ~30 LOC of inline duplication in each. - `_shared/wire-format.ts` — `ISO_8601_PATTERN` constant + `isIso8601(s)` predicate. Documented rationale for choosing a regex over `Date.parse` (too permissive), `new Date(s).toISOString()` (too strict), or `Temporal.Instant.from` (Node 24+ experimental). Future wire-shape predicates (data URI, percent-encoded filename, etc.) can land here. Cherry-pick footprint when graduating to upstream PR is the SEP folder + the imported `_shared/` files. First PR through carries them upstream; subsequent feat branches inherit via standard upstream-sync flow. All 9 scenario tests still pass against the Go reference fixtures.

Pure rename — the call site cares about server readiness, not the TCP-poll implementation detail. Matches the rename now landed on feat/tasks-mrtr-extension (PR modelcontextprotocol#262).

… helpers Drops initRawSession/rawRequest/rawRequestFull from tasks/helpers.ts in favor of the SDK's Client + StreamableHTTPClientTransport, paired with a Zod passthrough schema (AnyResult) that preserves SEP-2663 / SEP-2322 draft fields the SDK's typed schemas would strip. headers.ts and notifications.ts keep a small inline fetch where the SDK can't reach: per-request HTTP headers (SEP-2243) and SSE notification observation. Both reuse the SDK session via transport.sessionId. All SEP-2663 + MRTR ephemeral-flow scenarios pass against the Go fixture.

PR 2663 commit 62758914 standardised every duration field on the Ms suffix, integer milliseconds. wire-fields.ts now asserts ttlMs and pollIntervalMs are present on CreateTaskResult, the legacy v1 ttl and pollInterval keys are absent (already covered), and the interim ttlSeconds / pollIntervalMilliseconds keys are also absent on a post-2026-05-07 server. lifecycle.ts and the scenario README pick up matching prose updates. Verified by make testconf-tasks-v2 (8/8) against a renamed mcpkit fixture, and make testconf-mrtr (7/7 + 1 SKIPPED) against the paired MRTR surface.

SEP-2322 merged on 2026-05-06 with the variant renamed from IncompleteResult to InputRequiredResult and the resultType discriminator from "incomplete" to "input_required" (commit de6d76fb, per dsp-ant request). The MRTR_INCOMPLETE_RESULT_TYPE constant was specifically designed as a one-line flip point for this scenario. Renames - MRTR_INCOMPLETE_RESULT_TYPE = "incomplete" -> MRTR_INPUT_REQUIRED_RESULT_TYPE = "input_required" - isIncompleteResult -> isInputRequiredResult - All "IncompleteResult" -> "InputRequiredResult" in scenario prose and check descriptions (ephemeral-flow.ts, README.md) SEP-2663 had not yet flipped its discriminator literal as of PR head 82fb2c4d (5/7 21:52 UTC). Caitie's 5/15 RC commitment (issue comment 4384052694 on PR 2322) tracks the alignment to "input_required" both sides. The constant remains the one-line flip point in case the 2663 follow-up surprises us. Tested via mcpkit's make testconf-mrtr (7/7 + 1 SKIPPED green against a renamed mcpkit fixture) and make testconf-tasks-v2 (8/8 still green, no regressions on the paired surface).

Lefthook prettier reformatted column alignment on first push attempt; README also had a stale "renamed from InputRequiredResult" — should read "renamed from IncompleteResult". Fix both.

Two stale references in the mrtr-tasks-composition SKIPPED check: - Comment block + errorMessage framed blocker (a) as "spec authors disagree" / "input_required vs incomplete". SEP-2322 merged 2026-05-06 with "input_required" (commit de6d76fb). The blocker now reads as "SEP-2663 has not yet aligned to the merged 2322 literal" — Caitie's 5/15 RC commitment (PR 2322 issue comment 4384052694) tracks the alignment. - errorMessage referenced "IsIncomplete signal" — that field was renamed to IsInputRequired on the mcpkit side in lockstep with the SEP-2322 wire-variant rename. Updated to match. Status stays SKIPPED because blocker (b) — the mcpkit middleware refactor (issue 347) — is still open.

…sage After SEP-2322 merged with "input_required", the only blocker that actually keeps mrtr-08 SKIPPED is the eager-task-creation pattern in reference-server middleware (panyam/mcpkit issue 347). The earlier two-blocker framing read as if the test were waiting on both, but blocker (a) is effectively resolved for any server that emits the merged-2322 literal — leaving (b) as the sole gate. Tighten the comment block + description + errorMessage to lead with the middleware refactor and demote the discriminator history to a parenthetical aside.

Two requested changes from the SEP-2663 author's review pass on modelcontextprotocol/conformance PR 262 (pullrequestreview-4254601106). (1) Drop the interim-key absence checks. The transition window between "ttlSeconds / pollIntervalMilliseconds" and the final "ttlMs / pollIntervalMs" wording is over now that the merged spec settled on the Ms suffix. Useful while the spec was in flight, noise once it stabilized. Removes the absence checks plus the surrounding comment + description + details fields. (2) Add Number.isInteger() to ttlMs and pollIntervalMs validation. Spec says integer milliseconds; the previous typeof + range check would have allowed fractional values. Now both fields fail if they're not integers. README scenario table tightened: "ttlMs + pollIntervalMs present and integer-valued; legacy ttl / pollInterval keys absent".

panyam · 2026-05-14T23:45:25Z

Hi @CaitieM20 can I get you to review this too since we are looking at a subset of mrtr tests which intersect with Tasks. Thanks heaps.

SEP-2663 merged Final on 2026-05-15. Four normative wire changes baked in at merge time. This commit updates the tasks server-conformance scenarios so the suite validates each one as a MUST rather than tolerating divergence as INFO/MAY. 1. notifications/tasks/status renamed to notifications/tasks. The status-notifications scenario now FAILs on observing the legacy method on the v2 surface, and validates the new name's payload shape when emitted. 2. notifications/progress and notifications/message MUST NOT be sent for tasks. Two new absence-asserts on the status-notifications scenario, reusing the existing SSE-observation harness. 3. requestState removed from the tasks-v2 wire. The previous request-state scenario tested the deleted "Request State Management" section; replaced with absence-asserts on CreateTaskResult, DetailedTask, and notifications/tasks payloads. SEP-2322's InputRequiredResult on the MRTR surface still carries requestState and is unchanged. 4. -32003 (Missing Required Client Capability) for required-task tools when the client did not negotiate the extension. New scenario TasksRequiredTaskErrorScenario initializes without declaring the extension, calls a TaskSupport=required tool, and asserts the rejection code plus the structured requiredCapabilities payload. Documents `failing_job` as a fixture requirement. Verified locally against panyam/mcpkit examples/tasks-v2: 8/9 scenarios pass; the request-state-removal scenario intentionally FAILs against the current mcpkit main because the implementation still emits requestState on DetailedTask. That gap is the next item on the mcpkit catch-up plan.

feat(tasks): align scenarios with the Final-merged SEP-2663 spec

…ension # Conflicts: # src/scenarios/index.ts

panyam · 2026-05-18T07:59:02Z

SEP-2663 merged Final at modelcontextprotocol/modelcontextprotocol#2663 on 2026-05-15. The branch is now aligned: the suite asserts the four merged normative changes (notification rename, progress/message MUST-NOT on tasks, requestState removal from the v2 wire, -32003 for required-mode without negotiation) as MUSTs rather than tolerating divergence.

Also rebased against main to clear the conflict from #259 / #276, and migrated existing tasks + MRTR scenarios to the source: ScenarioSource shape from #265 (added 'io.modelcontextprotocol/tasks' to EXTENSION_IDS - happy to move that constant).

Ready for another review pass.

LucaButBoring · 2026-05-19T20:32:30Z

+ * Status notifications are OPTIONAL but the wire method name is fixed
+ * post-merge. Three normative requirements observed here:
+ *   1. When a server emits status notifications, the method MUST be
+ *      `notifications/tasks` (renamed from `notifications/tasks/status`
+ *      before SEP-2663 merged Final).
+ *   2. The legacy method name `notifications/tasks/status` MUST NOT
+ *      appear on the v2 tasks surface.
+ *   3. `notifications/progress` and `notifications/message` MUST NOT be
+ *      sent on the task stream. The check is FAILURE-on-presence —
+ *      servers that fail to filter progress emissions inside their task
+ *      runtime regress here.


Depends on #271 but flagging this for follow-up as notifications/tasks now arrives in response to subscriptions/listen rather than opening an SSE stream on tasks/get. Under the updated proposal, if the server accepts subscriptions/listen for a task, it returns task status notifications there instead, and does not require clients to additionally poll.

Replaced the whole run() body with a single skipCheck per our Discord agreement to leave the scenario in and skip it. PR 271 has now merged into this branch, so the citation no longer leans on it as a dependency. The harness will be in a followup once 2575 is implemented in our reference sdk.

LucaButBoring · 2026-05-19T20:42:25Z

+    // Check 5: notifications/tasks payloads MUST NOT carry requestState.
+    // The spec deleted the field from the Task base interface (commit
+    // 3f1c3cfc), so the DetailedTask shape on every task-bearing message
+    // — tools/call CreateTaskResult, tasks/get, and these notifications
+    // — must omit it. The CreateTaskResult and tasks/get absence-asserts
+    // live in request-state.ts; this check covers the notification path
+    // by reusing the SSE-observation buffer above.
+    {
+      const id = 'tasks-status-notifications-no-request-state';
+      const name = 'TasksStatusNotificationsNoRequestState';
+      const description =
+        'notifications/tasks payloads MUST NOT carry `requestState` (the merged SEP-2663 removed the field from the Task base interface)';


We should probably remove these retroactive negative checks before merging, I think I might've mentioned that already at some point - a negative test case for something that was never in v1 is only useful for people who implemented v2 before it was actually merged.

Full scenario is now skipped, so none of those inline negative checks execute. Its just 50 ish lines now.

LucaButBoring · 2026-05-19T20:51:51Z

+ * This scenario asserts the absence:
+ *   - CreateTaskResult MUST NOT carry `requestState`.
+ *   - DetailedTask (tasks/get response) MUST NOT carry `requestState`,
+ *     regardless of status.


I'm going to be inconsistent with my reviews here and recommend leaving these retroactive negative tests in, in contrast with the notification ones I commented on before. The reason for that is SEP-2322 happens to define requestState in exactly the same place we used to define it in DetailedTask, so it is actually a foreseeable mistake for a fresh implementation of v2 Tasks to accidentally leave in requestState in this context.

Haha - all good. I dont know how you are even keeping up with what is consistent and what is not :)

I went ahead and kept the absence-asserts as you suggested and reframed the doc block around the SEP-2322 lexical-adjacency motivation. This way a fresh reader now sees the "this is a foreseeable copy-paste from InputRequiredResult" framing instead of pre-merge history.

LucaButBoring · 2026-05-19T20:55:57Z

+ * `initialize`. The error data SHOULD carry a `requiredCapabilities`
+ * object whose shape mirrors the `InitializeRequest` capabilities, so
+ * the client can self-describe what to add without needing out-of-band
+ * documentation.


This is actually a MUST requirement here.

Fixed. The JSDoc and the public scenario description now lead with SEP-2575 §"Missing Required Capabilities" as the canonical MUST home of -32003 and with SEP-2663 §"Required Capabilities" applying it to the tasks extension. Ive made sure that specReferences on the -32003 checks now carry both SEP refs.

LucaButBoring · 2026-05-19T21:00:25Z

+/**
+ * SEP-2663 Tasks Extension — required-task error conformance.
+ *
+ * The merged SEP-2663 says that a server which cannot service a request


I recommend a cleanup pass over the whole PR at some point to tidy up these comments which refer to requirements that were dropped midway through the SEP lifecycle and thus won't make sense to people reading this with only knowledge of the merged v1 and v2 states (and not all of the incremental changes made between those points).

+1. Did the cleanup pass across the tasks/* scenarios — dropped the various framings and updated each doc block in terms of the current spec only. The one exception is the SEP-2322 lexical-adjacency context on request-state.ts, which I kept because it's the current-spec motivation for the negative test, not incremental SEP history.

LucaButBoring · 2026-05-19T21:07:29Z

    // Check 8: SKIPPED — MRTR → Tasks composition.
    // Tracking placeholder; spec made this normative in commit 451f5e1
    // (Apr 30). Blocker: reference servers need a middleware that
    // observes the handler's InputRequiredResult signal BEFORE creating
    // a task — the natural implementation pattern (create task up-front,
    // run handler in goroutine) doesn't expose the signal in time, so
    // round 1 of an MRTR-composing tools/call ends up emitting
    // CreateTaskResult instead of InputRequiredResult. Tracked in
    // https://github.com/panyam/mcpkit/issues/347 as one example impl
    // that hits this; SDKs in any language will need an equivalent fix.
    //
    // (An earlier version of this skip also tracked a discriminator
    // value blocker on "incomplete" vs "input_required". SEP-2322
    // merged on 2026-05-06 with "input_required" (commit de6d76fb).
    // SEP-2663's mdx hasn't yet caught up but every server emitting
    // the merged 2322 literal is interoperable, so the blocker is
    // effectively resolved for conformance purposes.)


Flagging this for followup - and in the updated composition test, we'll have to ensure that the MRTR phase does allow and maintain requestState, while the Task phase does not allow it.

Acked. This asymmetric invariant (requestState allowed in MRTR but not in tasks) is exactly what I was hoping the composition test needs to encode across phases. I'll file it as a follow-up alongside the subscriptions/listen rewrite once this lands.

LucaButBoring · 2026-05-19T21:08:46Z

+      }
+    }
+
+    // Cleanup so the fixture doesn't leak a 60-second goroutine.


nit: remove decontextualized comment (also elsewhere)

- notifications.ts: skip the entire scenario; SEP-2663 routes notifications/tasks over the SEP-2575 subscriptions/listen stream, not the tools/call POST SSE the prior harness observed. Body preserved in git blame for the rewrite follow-up. - request-state.ts: keep the absence-asserts (per the asymmetric recommendation in review) and reframe the doc block around the SEP-2322 lexical-adjacency motivation rather than pre-merge SEP history. - required-task-error.ts: cite SEP-2575 §"Missing Required Capabilities" as the canonical home of -32003; add SEP_2575_REF to specReferences on the -32003 checks. - Sweep decontextualized "merged spec dropped X" / "pre-merge SEP-2663 said Y" framing from tasks/* doc blocks and inline check descriptions.

LucaButBoring · 2026-05-20T00:40:43Z

+          // Status settles to cancelled — observe via tasks/get.
+          const after = (await client.request(
+            { method: 'tasks/get', params: { taskId: cancelTaskId } },
+            AnyResult
+          )) as any;
+          if (after.status !== 'cancelled') {
+            errs.push(
+              `tasks/get after cancel MUST report cancelled; got ${after.status}`
+            );
+          }


I believe this is missing a wait, given cancellation is eventually-consistent? It should indeed settle to cancelled as the comment suggests, but won't necessarily be cancelled until it hits its terminal status. Technically it doesn't even need to settle to cancelled but we can make that part of the tool contract here so that this test remains reliable.

Done sir. Now the post-cancel obsrv now via waitForTerminal() rather than reading immediately, and this is also documented in the the README's slow_compute row that it settles to cancelled after cancellation so the assertion has a tool-contract anchor.

LucaButBoring · 2026-05-20T00:47:09Z

+      }
+    }
+
+    // Check 8: tasks/cancel on a terminal task MUST return -32602.


This is actually not required, this is still an empty-ack case - otherwise clients would have to deal with race conditions on this.

Yep flipped this now.

LucaButBoring · 2026-05-20T00:47:52Z

+              params: {
+                taskId,
+                inputResponses: responses,
+                requestState: inputTask.requestState


Missed a requestState removal here

Fixed now. Also tasks/update params here now carry only taskId + inputResponses. requestState belongs only to the SEP-2322 ephemeral InputRequiredResult surface.

LucaButBoring · 2026-05-20T00:48:28Z

+
+| SEP      | What it adds                                                                                                                                                                                                                                                                                                                                                                                | Where it shows up                   |
+| -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------- |
+| SEP-2663 | Tasks Extension — `io.modelcontextprotocol/tasks` capability, flat `CreateTaskResult` (`Result & Task`), `DetailedTask` on `tasks/get` (with inlined result/error/inputRequests/requestState), `tasks/update` for MRTR resume, ack-only `tasks/cancel`, wire-field renames (`ttlMs`, `pollIntervalMs`, both integer milliseconds per the 2026-05-07 spec commit aligning duration suffixes) | every scenario                      |


Also missed a requestState removal here and at the top of this file.

+1 requestState is no longer listed as a SEP-2663 wire field anywhere (top blurb, Specs table SEP-2663 row, wire-format diff, raw-fetch design note). Also kept it on the SEP-2322 row where it correctly describes the ephemeral MRTR surface.

LucaButBoring · 2026-05-20T00:50:22Z

+          const errs: string[] = [];
+          // Ack carries only the SEP-2322 discriminator — no task envelope.
+          if (
+            JSON.stringify(ack) !== JSON.stringify({ resultType: 'complete' })


This check is actually too strict, as we do still permit metadata (all result shapes do)

Hmm true. Loosened this now. The ack check now asserts resultType === 'complete' plus absence of the task-envelope fields (taskId, status, result, error, inputRequests), instead of strict-equality against the literal {resultType:"complete"}. _meta and any future result-shape metadata pass through.

Also applied the same to the tasks/update ack check in mrtr-input.ts which had the same anti-pattern.

LucaButBoring · 2026-05-20T00:50:54Z

+### `tasks-request-state` (`request-state.ts`)
+
+| Check                                 | What it tests                                                                                                     |
+| ------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
+| `tasks-request-state-shape`           | When emitted, `requestState` is a non-empty string (`INFO` if server omits it; emission is optional per SEP-2322) |
+| `tasks-request-state-echo`            | Server accepts `tasks/get` with the previously-emitted `requestState` echoed back                                 |
+| `tasks-request-state-stale-tolerance` | Earlier (stale-but-still-valid) `requestState` MUST still be accepted after a newer one is minted                 |


This section is stale now

Ok the tasks-request-state table now describes the current absence-asserts scenario (tasks-create-result-no-request-state and tasks-get-detailed-no-request-state). The prior shape/echo/stale-tolerance rows are gone since they assumed requestState lived on the wire

LucaButBoring · 2026-05-20T00:56:05Z

+          if (e.code !== -32601) {
+            errs.push(
+              `${tc.method} MUST return -32601; got ${e.code ?? '<missing>'}`
+            );
+          }


Thinking about it, this actually isn't specified, we should probably return -32003 here. Making a note of this - -32003 is also what we use for subscriptions/listen on tasks.

Ive gone ahead and flipped this to -32003 in both the conformance assertion and the mcpkit fixture (gateOnTasksExtension + the *RejectedWithoutExtension tests). Just checking quickly while we are at it - does the merged SEP-2663 (or SEP-2575) actually reflect -32003 for this path? I havent fully checked the latest but was operating under -32601. Wanted to confirm before this conformance lands ahead of the spec. Happy to revert to -32601 if not, or leave as-is with the open-questions flag I added to the README

It's unspecified either way - just opened a PR to specify -32003 in particular on the grounds that -32601 implies the server doesn't support tasks at all: modelcontextprotocol/modelcontextprotocol#2756

`taskId` is the session-continuation handle on the tasks-v2 MRTR-resume path; `requestState` lives only on the SEP-2322 ephemeral InputRequiredResult surface, not on the tasks-v2 wire.

The cancel-empty-ack and tasks/update-ack checks were comparing the whole response against the literal `{resultType:"complete"}` via `JSON.stringify` equality. That rejected acks carrying `_meta` or any other future result-shape metadata even though the spec permits them. Now structural: assert `resultType === "complete"` plus absence of the task-envelope fields (`taskId`, `status`, `result`, `error`, `inputRequests`). `_meta` and other metadata pass through.

The cancel-empty-ack check observed status via an immediate tasks/get after the cancel ack. Cancellation is eventually-consistent, so the task could still be `working` (or transitioning) at the moment of observation, making the assertion flaky against any reasonable server. Now polls via `waitForTerminal()` and asserts the terminal status is `cancelled`. README's `slow_compute` fixture row documents the matching tool contract ("MUST settle to `cancelled` when `tasks/cancel` arrives while running") so the test has a deterministic anchor.

The previous check asserted `-32602` when cancelling an already-terminal task. That forces every client to handle the race between observing a task's terminal status and racing in a cancel request — the task can terminate between the two and the cancel then errors instead of acking. Check 8 renamed to `tasks-cancel-terminal-idempotent-ack` and now asserts the same empty-ack shape (`resultType:"complete"` + no task- envelope fields) as on an active task. README's lifecycle table row updated to match.

`requestState` belongs to the SEP-2322 ephemeral MRTR result shape (`InputRequiredResult`), not to the tasks-v2 wire. The README listed it under SEP-2663 wire fields in four spots — top blurb, the SEP-2663 row of the Specs table, the v1↔v2 wire-format diff, and the raw-fetch design note. All four now describe only the actual SEP-2663 shapes; the raw-fetch note still calls out the SEP-2322 ephemeral `requestState` separately so the rationale for stripping it via raw fetch stays intact. The SEP-2322 row of the Specs table is unchanged since `requestState` is correctly listed there as an MRTR base type.

The README's `tasks-request-state` table still described the prior scenario shape (shape / echo / stale-tolerance checks), which had assumed `requestState` was on the tasks-v2 wire. The scenario itself was rewritten earlier into the two-check absence assertion (`tasks-create-result-no-request-state` / `tasks-get-detailed-no-request-state`); the README now matches. A motivational paragraph above the table explains why a negative test exists for a field the spec never defines (the SEP-2322 lexical-adjacency confusion vector).

When a client calls tasks/get / tasks/update / tasks/cancel without having negotiated the io.modelcontextprotocol/tasks extension, the suite now asserts -32003 (Missing Required Client Capability, SEP-2575 §"Missing Required Capabilities") instead of -32601. The spec doesn't currently mandate this code for the gated-method path, so the assertion is forward-looking: it follows the SEP-2575 pattern that already governs `required` tools (and that subscriptions/listen for tasks is expected to use). README's open-questions list calls this out so readers know it's a spec-clarification ahead-of-the-curve. Drops the now-obsolete "Invalid requestState — silent ack vs -32602" open-questions item: the previous request-state scenario covered that, but the rewritten absence-asserts scenario no longer exercises that path.

chore(tasks): SEP-2663 scenarios — PR 262 round-2 review fixes

LucaButBoring · 2026-05-20T22:15:41Z

Just about everything looks good now, final two issues I think:

mrtr-tasks-composition is still skipped - we aligned both SEPs on resultType: "input_required" before they were merged (we can handle that in the same followup as (2)).
subscriptions/listen - we discussed this one on Discord already, skip is intentional and we'll have a followup PR for it since the reference implementation hasn't picked up SEP-2575 yet.

LucaButBoring · 2026-05-20T22:54:36Z

@pcarleton Can you take a look at this? We're planning to get this in in its current state with the two cross-SEP suites skipped and ship a fast-follow with the rest (2322+2663 and 2575+2663).

panyam added 4 commits May 5, 2026 14:14

style(tasks): apply prettier formatting

4f2eac0

panyam added 2 commits May 6, 2026 14:22

style: prettier formatting on tasks/mrtr scenarios

683c633

panyam marked this pull request as ready for review May 6, 2026 21:29

LucaButBoring mentioned this pull request May 6, 2026

SEP-2663: Tasks Extension modelcontextprotocol/modelcontextprotocol#2663

Merged

9 tasks

panyam added 7 commits May 7, 2026 15:13

style: prettier column alignment on tasks README

e19dbe0

mrtr: prettier formatting + fix README rename typo

08c2466

Lefthook prettier reformatted column alignment on first push attempt; README also had a stale "renamed from InputRequiredResult" — should read "renamed from IncompleteResult". Fix both.

style: prettier line-break on mrtr-08 description string

f65dd24

LucaButBoring suggested changes May 8, 2026

View reviewed changes

Comment thread src/scenarios/server/tasks/wire-fields.ts Outdated

Comment thread src/scenarios/server/tasks/wire-fields.ts Outdated

panyam added 3 commits May 9, 2026 08:21

style: prettier column alignment on wire-fields README row

29d6501

Merge branch 'main' into feat/tasks-mrtr-extension

a93975a

panyam added 3 commits May 17, 2026 21:44

Merge pull request #2 from panyam/feat/tasks-mrtr-final-merged-alignment

c37e496

feat(tasks): align scenarios with the Final-merged SEP-2663 spec

Merge remote-tracking branch 'upstream/main' into feat/tasks-mrtr-ext…

8e4b62c

…ension # Conflicts: # src/scenarios/index.ts

LucaButBoring suggested changes May 19, 2026

View reviewed changes

panyam added 3 commits May 19, 2026 15:29

Merge branch 'main' into feat/tasks-mrtr-extension

d24ef6e

chore(tasks): prettier — strip trailing blank line in notifications.ts

9f7cc80

LucaButBoring suggested changes May 20, 2026

View reviewed changes

panyam added 11 commits May 19, 2026 18:54

Merge branch 'main' into feat/tasks-mrtr-extension

c377ae5

chore(tasks): drop requestState from tasks/update params

37b7579

`taskId` is the session-continuation handle on the tasks-v2 MRTR-resume path; `requestState` lives only on the SEP-2322 ephemeral InputRequiredResult surface, not on the tasks-v2 wire.

chore(tasks): prettier — realign README table pipes after row edits

6949ec6

Merge pull request #4 from panyam/review/pr-262-round-2

40d83be

chore(tasks): SEP-2663 scenarios — PR 262 round-2 review fixes

Merge branch 'main' into feat/tasks-mrtr-extension

c76488b

LucaButBoring mentioned this pull request May 20, 2026

fix(SEP-2663): specify -32003 error for missing client capability modelcontextprotocol/modelcontextprotocol#2756

Open

9 tasks

Conversation

panyam commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Relationship to PR #188 (SEP-2322 MRTR)

Scope (8 ClientScenario classes, ~33 checks)

Design highlights

Open spec questions

Closes: #261

Uh oh!

panyam commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

panyam commented May 14, 2026

Uh oh!

panyam commented May 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

panyam May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

panyam commented May 5, 2026 •

edited

Loading

panyam May 19, 2026 •

edited

Loading

LucaButBoring commented May 20, 2026 •

edited

Loading