Break EarlyNetworkConfig up to support cleaner bootstore type changes#9944
Break EarlyNetworkConfig up to support cleaner bootstore type changes#9944jgallagher wants to merge 8 commits intomainfrom
EarlyNetworkConfig up to support cleaner bootstore type changes#9944Conversation
| .min_ttl | ||
| .map(|val| { | ||
| u8::try_from(*val) | ||
| .map_err(|_| BgpPeerConfigDataError::MinTtl(*val)) |
There was a problem hiding this comment.
This is a (potential?) behavior change: the old code in sync_switch_configuration.rs had:
min_ttl: c.min_ttl.map(|x| x.0 as u8), //TODO avoid cast return errorso I addressed this TODO while I was here. But if we have any min_ttl values greater than 255 in the db, we'll now fail to convert configs instead of silently converting them to value % 256. I could change this back if that's a concern, and leave the TODO for later?
There was a problem hiding this comment.
hey I just bumped into this in #9941
This is a divergence between the database and diesel.
There was a problem hiding this comment.
... huh! I guess that means all deployed systems must have NULL min_ttl values, otherwise we'd be seeing diesel serialization errors? So this conversion code won't run anyway, and once #9941 lands we can drop it. Thanks!
There was a problem hiding this comment.
I'm not sure if they're all NULL, but I'm positive they're "INT2", and relatively sure they can all be parsed as a "SqlU8" value - otherwise, yeah, would see a serialization error.
There was a problem hiding this comment.
I'm thinking of #8587 - I think that was basically the same case, except there inserting any non-NULL value (even one in range!) resulted in a serialization error. I assume the postgres protocol is sending back 2 bytes and diesel is expecting 8, if it thinks the column is actually a SqlU32 (i64).
This PR is pretty widespread and touches some easily-broken bits, so I'll go into some background here. The replicated bootstore (soon to be backed by trust quorum instead, but that's not relevant to the context here) gives an eventually-consistent copy of data to all sleds. The type exposed by the bootstore is
omicron/bootstore/src/schemes/v0/storage.rs
Lines 119 to 126 in 7c56d21
generationensures any write requests from an out-of-date Nexus can't overwrite newer changes. Prior to this PR,blobcontains a JSON-serializedEarlyNetworkConfig, defined asomicron/sled-agent/types/versions/src/bgp_v6/early_networking.rs
Lines 23 to 35 in 7c56d21
EarlyNetworkConfigBodycontains a variety of networking configuration information required to bring up connectivity on a rack cold boot, including BGP details, which switches have transceivers and in which slots, etc. This type is quite complex, and historically changing it has been quite painful - addressing that is the point of this PR and #9801 in general. This PR does not changeNetworkConfigorEarlyNetworkConfigBody- the focus is onEarlyNetworkConfig. It has a couple of problems:schema_versionis supposed to tell us what version ofbodyis present, but it's defined in line with the body. That means any time we rev a new version ofEarlyNetworkConfigBody, we also have to rev a new version ofEarlyNetworkConfig, and don't have an opportunity to inspectschema_versionbefore already needing to know how to deserializebody.generationis duplicated with thegenerationinNetworkConfig. (This is not nearly as much of a problem in practice as the previous point, but is an opportunity for illegal states - what would it mean for aNetworkConfigat generation N to hold a blobifiedEarlyNetworkConfigwith a different generation?)EarlyNetworkConfigis used in three places onmain:blobin the bootstore as described abovedatain thebootstore_configCRDB table (this has the same "duplicated generation" problem asNetworkConfig- this table also storesgenerationas a separate column next todata)write_network_bootstore_config()sled-agent OpenAPIThis PR breaks
EarlyNetworkConfigup into two types to address the problems above. The serialized form is nowEarlyNetworkConfigEnvelope:omicron/sled-agent/types/versions/src/bootstore_versioning/early_networking.rs
Lines 45 to 55 in a8de498
This has
schema_versionand an opaquebody(typed asserde_json::Value). This fixes both problems withEarlyNetworkConfig:EarlyNetworkConfigEnvelopedoes not have to have a new type revved any timeEarlyNetworkConfigBody, becausebodyis opaque. We can deserialize the envelope, then inspectschema_versionto know which version ofEarlyNetworkConfigBodyit contains. (My hope is we never have to rev this type.)generation.This is fully backwards-compatible as far as deserialization is concerned: any existing
EarlyNetworkConfigcan be safely deserialized as anEarlyNetworkConfigEnvelope:schema_versionis unchangedgenerationwill be ignoredbodywill be read as aserde_json::Valueinstead of anEarlyNetworkConfigBodyEarlyNetworkConfigEnvelopedoes not implementJsonSchema, because it should not be used in HTTP / OpenAPI contexts; it's only meant to be a serialization wrapper. That brings us to the second type added in this PR,WriteNetworkConfigRequest:omicron/sled-agent/types/versions/src/bootstore_versioning/early_networking.rs
Lines 38 to 42 in a8de498
This is now the type accepted by sled-agent's
write_network_bootstore_config()endpoint. sled-agent will convert this request into abootstore::NetworkConfigin the straightforward way:bodyin anEarlyNetworkConfigEnvelopebodyasNetworkConfig::blobgenerationasNetworkConfig::generationI believe this gets us most of the way to #9801. I want to do an actual rev of
EarlyNetworkConfigBodyto confirm this puts us in a good place (and also work on the mechanics of ensuring that any revs made are required to update the relevant bits of implementation that need to be updated, such asEarlyNetworkConfigEnvelopeknowing how to deserialize the new body type) before closing the issue. I'll also do some update testing on a racklette to confirm I didn't break anything w.r.t. backwards compatibility, but I believe all of these changes should be safe.This PR also fixes #9943: sled-agent should never "upconvert"
EarlyNetworkConfigBodyrequests from Nexus into a newer version; it needs to replicate the exact version it's given.