Split "set target release" endpoint into two: one for update, one for mupdate recovery#9887
Split "set target release" endpoint into two: one for update, one for mupdate recovery#9887jgallagher wants to merge 20 commits intomainfrom
Conversation
|
Schema diff. Very simple, nice that they take the same params. Do you think I should expose this functionality in the console? Probably not, right? --- a/2026021301.0.0-6e51ab/spec.json
+++ b/2026021800.0.0-38e767/spec.json
@@ -7,7 +7,7 @@
"url": "https://oxide.computer",
"email": "api@oxide.computer"
},
- "version": "2026021301.0.0"
+ "version": "2026021800.0.0"
},
"paths": {
"/device/auth": {
@@ -12383,6 +12383,35 @@
}
}
},
+ "/v1/system/update/target-release/recovery": {
+ "put": {
+ "tags": ["system/update"],
+ "summary": "Recover from an Oxide-support-driven system update",
+ "description": "Inform the control plane of the release of the rack's system software it is now running due to a recovery operation (\"mupdate\") performed by Oxide support.\n\nThis endpoint should only be called at the direction of Oxide support.",
+ "operationId": "target_release_update_recovery",
+ "requestBody": {
+ "content": {
+ "application/json": {
+ "schema": {
+ "$ref": "#/components/schemas/SetTargetReleaseParams"
+ }
+ }
+ },
+ "required": true
+ },
+ "responses": {
+ "204": {
+ "description": "resource updated"
+ },
+ "4XX": {
+ "$ref": "#/components/responses/Error"
+ },
+ "5XX": {
+ "$ref": "#/components/responses/Error"
+ }
+ }
+ }
+ },
"/v1/system/update/trust-roots": {
"get": {
"tags": ["system/update"], |
Probably not, yeah. @ahl and I chatted about this a few weeks ago, and IIRC we wanted to tuck this operation somewhere out of the main path even in the CLI, since it should only be called after support performs a mupdate (and will fail if called any other time anyway). |
davepacheco
left a comment
There was a problem hiding this comment.
(still going through nexus/src/app/deployment.rs but wanted to leave this before the watercooler)
davepacheco
left a comment
There was a problem hiding this comment.
This looks good!
I think it wouldn't hurt to get another set of eyes on it, given how tricky and important this is.
nexus/src/app/deployment.rs
Outdated
| // bypass all our typical version ordering requirements, so we have to allow | ||
| // recovery to the _actual_ version it installed, regardless of what we | ||
| // currently have on the system. | ||
| // does not take an arguments about the proposed system version (unlike |
There was a problem hiding this comment.
| // does not take an arguments about the proposed system version (unlike | |
| // does not take any arguments about the proposed system version (unlike |
nexus/src/app/deployment.rs
Outdated
| // Update status of a sled, not considering its zones, based on the current | ||
| // target version. |
There was a problem hiding this comment.
| // Update status of a sled, not considering its zones, based on the current | |
| // target version. | |
| // Status of any update or mupdate on a sled, not considering its zones, based on the current | |
| // target version. |
(easy to misread "Update status" as "this is going to update the status")
|
Testing notes from dublin: I initially set up the rack with the TUF repo from this branch (version The first request was to set the target release "for update" to my fake R20. This succeeded, because we always allow the initial target release to be set. After this, {
"components_by_release_version": {
"install dataset": 59,
"20.0.0-0.local+git553e6c0886a": 17,
"unknown": 9
},
"suspended": false,
"target_release": {
"time_requested": "2026-02-27T23:16:36.127893Z",
"version": "20.0.0-0.local+git553e6c0886a"
},
"time_last_step_planned": "2026-02-27T19:51:21.296256Z"
}Prior to this PR we'd be stuck here. We need to set the correct release, We can't downgrade: And we can't start an upgrade because we're waiting for mupdate recovery: However, we can successfully use the new and we can use it to downgrade (which is useful here): A few minutes after doing so, the system recognized that all components were on the target version: {
"components_by_release_version": {
"19.0.0-0.ci+git44ac79d168b": 85
},
"suspended": false,
"target_release": {
"time_requested": "2026-02-27T23:58:20.286946Z",
"version": "19.0.0-0.ci+git44ac79d168b"
},
"time_last_step_planned": "2026-02-27T23:59:23.469113Z"
}We still can't start a new update to the current target version, as expected: But now we can start an update to a later version: While that update is running, we can't start another one, as expected: and we can't use the |
The existing "set target release" external API endpoint is used for two reasons:
However, the checks we ought to perform for "should the new target release version be allowed" are pretty different for the two cases, and we were both too strict and too loose. A couple examples of incorrect behavior prior to this PR:
As of this change, there are separate "set target release for update" and "set target release for mupdate recovery" endpoints with more correct validation for each intent. In the two examples above:
Closes #9113. Also addresses an issue @askfongjojo ran into on a racklette recently with needing to "downgrade"; e.g., in a sequence like this:
After this change, we can now correct the mistake in step 4: because 18 wasn't the release actually deployed, we'd still be in the "need to recover from mupdate" state, allowing the operator to set the target release back to 17.