Add nb_cfg_timestamp to SB_Global for propagation latency.#306
Open
tehhobbit wants to merge 1 commit into
Open
Add nb_cfg_timestamp to SB_Global for propagation latency.#306tehhobbit wants to merge 1 commit into
tehhobbit wants to merge 1 commit into
Conversation
8a05ac1 to
0099551
Compare
22ab3b0 to
075b2d1
Compare
dceara
pushed a commit
to dceara/ovn
that referenced
this pull request
Jun 11, 2026
Large scale OVN deployments commonly disable the per-chassis nb_cfg write-back mechanism by setting options:enable_chassis_nb_cfg_update to false. With thousands of hypervisors each writing their nb_cfg completion back to Chassis_Private on every generation, the resulting write amplification can overload the southbound OVSDB cluster. Disabling write-back eliminates this pressure but also removes the only existing signal for measuring how long a northbound change takes to reach each hypervisor. OVN_Northbound already records nb_cfg_timestamp in NB_Global when ovn-northd advances nb_cfg, but hypervisors connect to the southbound database only. This patch adds the same timestamp to SB_Global, written atomically with each nb_cfg update. ovn-controller reads this value and stores it in the local OVS bridge external_ids as ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local completion time). An external collector such as ovs_exporter can read both values from the bridge and compute per-chassis propagation latency histograms without any writes to the southbound database, keeping measurement overhead independent of fleet size. Placing the timestamp in SB_Global rather than requiring collectors to reach the northbound database means it travels transparently through any relay or VPN between the southbound cluster and the hypervisor, naturally including that transit in the measurement. Testing: confirmed in OVN sandbox and a two-container central/HV setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and continues to update correctly when enable_chassis_nb_cfg_update is set to false. Signed-off-by: Loke Berne <loke@tehhobbit.net> Assisted-by: Claude Sonnet 4.6 Submitted-at: ovn-org#306 Signed-off-by: Numan Siddique <numans@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
eecbe84 to
324649f
Compare
Large scale OVN deployments commonly disable the per-chassis nb_cfg write-back mechanism by setting options:enable_chassis_nb_cfg_update to false. With thousands of hypervisors each writing their nb_cfg completion back to Chassis_Private on every generation, the resulting write amplification can overload the southbound OVSDB cluster. Disabling write-back eliminates this pressure but also removes the only existing signal for measuring how long a northbound change takes to reach each hypervisor. OVN_Northbound already records nb_cfg_timestamp in NB_Global when ovn-northd advances nb_cfg, but hypervisors connect to the southbound database only. This patch adds the same timestamp to SB_Global, written atomically with each nb_cfg update. ovn-controller reads this value and stores it in the local OVS bridge external_ids as ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local completion time). An external collector such as ovs_exporter can read both values from the bridge and compute per-chassis propagation latency histograms without any writes to the southbound database, keeping measurement overhead independent of fleet size. Placing the timestamp in SB_Global rather than requiring collectors to reach the northbound database means it travels transparently through any relay or VPN between the southbound cluster and the hypervisor, naturally including that transit in the measurement. Testing: confirmed in OVN sandbox and a two-container central/HV setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and continues to update correctly when enable_chassis_nb_cfg_update is set to false. Signed-off-by: Loke Berne <loke@tehhobbit.net> Assisted-by: Claude Sonnet 4.6
324649f to
85e946c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Large scale OVN deployments commonly disable per-chassis nb_cfg
write-back by setting options:enable_chassis_nb_cfg_update to
false. With thousands of hypervisors each writing completion
back to Chassis_Private on every generation, the resulting write
amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes
any signal for measuring how long a northbound change takes to
reach each hypervisor.
OVN_Northbound already records nb_cfg_timestamp in NB_Global
when ovn-northd advances nb_cfg, but hypervisors connect to the
southbound database only. This patch adds the equivalent
timestamp to SB_Global, written atomically with each nb_cfg
update. ovn-controller reads this value and stores it in the
local OVS bridge external_ids as ovn-nb-cfg-sb-ts alongside the
existing ovn-nb-cfg-ts (local completion time). An external
collector can read both values from the bridge and compute
per-chassis propagation latency histograms without any writes to
the southbound database, keeping measurement overhead independent
of fleet size.
Placing the timestamp in SB_Global rather than requiring
collectors to reach the northbound database means it travels
transparently through any relay or VPN between the southbound
cluster and the hypervisor, naturally including that transit in
the measurement.
Tested in the OVN sandbox and a two-container central/HV setup.
Confirmed nb_cfg_timestamp is written to SB_Global on each
nb_cfg advance, propagated to br-int external_ids as
ovn-nb-cfg-sb-ts, and continues updating correctly when
enable_chassis_nb_cfg_update is false.
Assisted-by: Claude Sonnet 4.5, Claude Code