Skip to content

Add nb_cfg_timestamp to SB_Global for propagation latency.#306

Open
tehhobbit wants to merge 1 commit into
ovn-org:mainfrom
tehhobbit:nb-cfg-timestamp-upstream
Open

Add nb_cfg_timestamp to SB_Global for propagation latency.#306
tehhobbit wants to merge 1 commit into
ovn-org:mainfrom
tehhobbit:nb-cfg-timestamp-upstream

Conversation

@tehhobbit

@tehhobbit tehhobbit commented May 20, 2026

Copy link
Copy Markdown
Contributor

Large scale OVN deployments commonly disable per-chassis nb_cfg
write-back by setting options:enable_chassis_nb_cfg_update to
false. With thousands of hypervisors each writing completion
back to Chassis_Private on every generation, the resulting write
amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes
any signal for measuring how long a northbound change takes to
reach each hypervisor.

OVN_Northbound already records nb_cfg_timestamp in NB_Global
when ovn-northd advances nb_cfg, but hypervisors connect to the
southbound database only. This patch adds the equivalent
timestamp to SB_Global, written atomically with each nb_cfg
update. ovn-controller reads this value and stores it in the
local OVS bridge external_ids as ovn-nb-cfg-sb-ts alongside the
existing ovn-nb-cfg-ts (local completion time). An external
collector can read both values from the bridge and compute
per-chassis propagation latency histograms without any writes to
the southbound database, keeping measurement overhead independent
of fleet size.

Placing the timestamp in SB_Global rather than requiring
collectors to reach the northbound database means it travels
transparently through any relay or VPN between the southbound
cluster and the hypervisor, naturally including that transit in
the measurement.

Tested in the OVN sandbox and a two-container central/HV setup.
Confirmed nb_cfg_timestamp is written to SB_Global on each
nb_cfg advance, propagated to br-int external_ids as
ovn-nb-cfg-sb-ts, and continues updating correctly when
enable_chassis_nb_cfg_update is false.

Assisted-by: Claude Sonnet 4.5, Claude Code

@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch 2 times, most recently from 8a05ac1 to 0099551 Compare May 21, 2026 17:10
Comment thread br-controller/ovn-br-controller.c Outdated
Comment thread controller/ovn-controller.c Outdated
Comment thread controller/ovn-controller.c Outdated
Comment thread controller/ovn-controller.c Outdated
@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch 2 times, most recently from 22ab3b0 to 075b2d1 Compare June 8, 2026 14:06
dceara pushed a commit to dceara/ovn that referenced this pull request Jun 11, 2026
Large scale OVN deployments commonly disable the per-chassis nb_cfg
write-back mechanism by setting options:enable_chassis_nb_cfg_update
to false.  With thousands of hypervisors each writing their nb_cfg
completion back to Chassis_Private on every generation, the resulting
write amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes the
only existing signal for measuring how long a northbound change takes
to reach each hypervisor.

OVN_Northbound already records nb_cfg_timestamp in NB_Global when
ovn-northd advances nb_cfg, but hypervisors connect to the southbound
database only.  This patch adds the same timestamp to SB_Global,
written atomically with each nb_cfg update.  ovn-controller reads
this value and stores it in the local OVS bridge external_ids as
ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local
completion time).  An external collector such as ovs_exporter can
read both values from the bridge and compute per-chassis propagation
latency histograms without any writes to the southbound database,
keeping measurement overhead independent of fleet size.

Placing the timestamp in SB_Global rather than requiring collectors
to reach the northbound database means it travels transparently
through any relay or VPN between the southbound cluster and the
hypervisor, naturally including that transit in the measurement.

Testing: confirmed in OVN sandbox and a two-container central/HV
setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg
advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and
continues to update correctly when enable_chassis_nb_cfg_update is
set to false.

Signed-off-by: Loke Berne <loke@tehhobbit.net>
Assisted-by: Claude Sonnet 4.6
Submitted-at: ovn-org#306
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch 6 times, most recently from eecbe84 to 324649f Compare June 15, 2026 15:03
Large scale OVN deployments commonly disable the per-chassis nb_cfg
write-back mechanism by setting options:enable_chassis_nb_cfg_update
to false.  With thousands of hypervisors each writing their nb_cfg
completion back to Chassis_Private on every generation, the resulting
write amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes the
only existing signal for measuring how long a northbound change takes
to reach each hypervisor.

OVN_Northbound already records nb_cfg_timestamp in NB_Global when
ovn-northd advances nb_cfg, but hypervisors connect to the southbound
database only.  This patch adds the same timestamp to SB_Global,
written atomically with each nb_cfg update.  ovn-controller reads
this value and stores it in the local OVS bridge external_ids as
ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local
completion time).  An external collector such as ovs_exporter can
read both values from the bridge and compute per-chassis propagation
latency histograms without any writes to the southbound database,
keeping measurement overhead independent of fleet size.

Placing the timestamp in SB_Global rather than requiring collectors
to reach the northbound database means it travels transparently
through any relay or VPN between the southbound cluster and the
hypervisor, naturally including that transit in the measurement.

Testing: confirmed in OVN sandbox and a two-container central/HV
setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg
advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and
continues to update correctly when enable_chassis_nb_cfg_update is
set to false.

Signed-off-by: Loke Berne <loke@tehhobbit.net>
Assisted-by: Claude Sonnet 4.6
@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch from 324649f to 85e946c Compare June 15, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants