Skip to content

Break EarlyNetworkConfig up to support cleaner bootstore type changes#9944

Open
jgallagher wants to merge 8 commits intomainfrom
john/bootstore-versioning
Open

Break EarlyNetworkConfig up to support cleaner bootstore type changes#9944
jgallagher wants to merge 8 commits intomainfrom
john/bootstore-versioning

Conversation

@jgallagher
Copy link
Contributor

This PR is pretty widespread and touches some easily-broken bits, so I'll go into some background here. The replicated bootstore (soon to be backed by trust quorum instead, but that's not relevant to the context here) gives an eventually-consistent copy of data to all sleds. The type exposed by the bootstore is

/// Network configuration required to bring up the control plane
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct NetworkConfig {
pub generation: u64,
// A serialized blob of configuration data. Network APIs must know how to
// appropriately serialize/deserialize
pub blob: Vec<u8>,
}

generation ensures any write requests from an out-of-date Nexus can't overwrite newer changes. Prior to this PR, blob contains a JSON-serialized EarlyNetworkConfig, defined as

pub struct EarlyNetworkConfig {
// The current generation number of data as stored in CRDB.
// The initial generation is set during RSS time and then only mutated
// by Nexus.
pub generation: u64,
// Which version of the data structure do we have. This is to help with
// deserialization and conversion in future updates.
pub schema_version: u32,
// The actual configuration details
pub body: EarlyNetworkConfigBody,
}

EarlyNetworkConfigBody contains a variety of networking configuration information required to bring up connectivity on a rack cold boot, including BGP details, which switches have transceivers and in which slots, etc. This type is quite complex, and historically changing it has been quite painful - addressing that is the point of this PR and #9801 in general. This PR does not change NetworkConfig or EarlyNetworkConfigBody - the focus is on EarlyNetworkConfig. It has a couple of problems:

  • schema_version is supposed to tell us what version of body is present, but it's defined in line with the body. That means any time we rev a new version of EarlyNetworkConfigBody, we also have to rev a new version of EarlyNetworkConfig, and don't have an opportunity to inspect schema_version before already needing to know how to deserialize body.
  • generation is duplicated with the generation in NetworkConfig. (This is not nearly as much of a problem in practice as the previous point, but is an opportunity for illegal states - what would it mean for a NetworkConfig at generation N to hold a blobified EarlyNetworkConfig with a different generation?)

EarlyNetworkConfig is used in three places on main:

  • Serialized as blob in the bootstore as described above
  • Serialized as data in the bootstore_config CRDB table (this has the same "duplicated generation" problem as NetworkConfig - this table also stores generation as a separate column next to data)
  • As the type for the write_network_bootstore_config() sled-agent OpenAPI

This PR breaks EarlyNetworkConfig up into two types to address the problems above. The serialized form is now EarlyNetworkConfigEnvelope:

#[derive(Clone, Debug, Deserialize, Serialize, PartialEq)]
pub struct EarlyNetworkConfigEnvelope {
// Which version of `EarlyNetworkConfigBody` is serialized into `body`.
pub(crate) schema_version: u32,
// The actual early network configuration details.
//
// These are a serialized `EarlyNetworkConfigBody` of some version. We must
// inspect `schema_version` to know how to interpret this value.
pub(crate) body: serde_json::Value,
}

This has schema_version and an opaque body (typed as serde_json::Value). This fixes both problems with EarlyNetworkConfig:

  • EarlyNetworkConfigEnvelope does not have to have a new type revved any time EarlyNetworkConfigBody, because body is opaque. We can deserialize the envelope, then inspect schema_version to know which version of EarlyNetworkConfigBody it contains. (My hope is we never have to rev this type.)
  • No duplicated generation.

This is fully backwards-compatible as far as deserialization is concerned: any existing EarlyNetworkConfig can be safely deserialized as an EarlyNetworkConfigEnvelope:

  • schema_version is unchanged
  • generation will be ignored
  • body will be read as a serde_json::Value instead of an EarlyNetworkConfigBody

EarlyNetworkConfigEnvelope does not implement JsonSchema, because it should not be used in HTTP / OpenAPI contexts; it's only meant to be a serialization wrapper. That brings us to the second type added in this PR, WriteNetworkConfigRequest:

#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema, PartialEq)]
pub struct WriteNetworkConfigRequest {
pub generation: u64,
pub body: EarlyNetworkConfigBody,
}

This is now the type accepted by sled-agent's write_network_bootstore_config() endpoint. sled-agent will convert this request into a bootstore::NetworkConfig in the straightforward way:

  1. Wrap body in an EarlyNetworkConfigEnvelope
  2. Serialize the now-wrapped body as NetworkConfig::blob
  3. Include generation as NetworkConfig::generation

I believe this gets us most of the way to #9801. I want to do an actual rev of EarlyNetworkConfigBody to confirm this puts us in a good place (and also work on the mechanics of ensuring that any revs made are required to update the relevant bits of implementation that need to be updated, such as EarlyNetworkConfigEnvelope knowing how to deserialize the new body type) before closing the issue. I'll also do some update testing on a racklette to confirm I didn't break anything w.r.t. backwards compatibility, but I believe all of these changes should be safe.

This PR also fixes #9943: sled-agent should never "upconvert" EarlyNetworkConfigBody requests from Nexus into a newer version; it needs to replicate the exact version it's given.

.min_ttl
.map(|val| {
u8::try_from(*val)
.map_err(|_| BgpPeerConfigDataError::MinTtl(*val))
Copy link
Contributor Author

@jgallagher jgallagher Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a (potential?) behavior change: the old code in sync_switch_configuration.rs had:

min_ttl: c.min_ttl.map(|x| x.0 as u8), //TODO avoid cast return error

so I addressed this TODO while I was here. But if we have any min_ttl values greater than 255 in the db, we'll now fail to convert configs instead of silently converting them to value % 256. I could change this back if that's a concern, and leave the TODO for later?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey I just bumped into this in #9941

This is a divergence between the database and diesel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... huh! I guess that means all deployed systems must have NULL min_ttl values, otherwise we'd be seeing diesel serialization errors? So this conversion code won't run anyway, and once #9941 lands we can drop it. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if they're all NULL, but I'm positive they're "INT2", and relatively sure they can all be parsed as a "SqlU8" value - otherwise, yeah, would see a serialization error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of #8587 - I think that was basically the same case, except there inserting any non-NULL value (even one in range!) resulted in a serialization error. I assume the postgres protocol is sending back 2 bytes and diesel is expecting 8, if it thinks the column is actually a SqlU32 (i64).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network changes during an update to R18 could replicate unreadable bootstore contents

2 participants