bgpd: skip peers not activated for AFI/SAFI in bgp_gr_check_all_eors()#22295
Conversation
bgp_gr_check_all_eors() walks every peer in bgp->peer and -- for any peer with PEER_STATUS_GR_WAIT_EOR set but no PEER_STATUS_EOR_RECEIVED -- splits into two code paths based on bgp->gr_multihop_peer_exists. An existing !afc check filtered out peers that do not have this AFI/SAFI configured/activated, but it sat after the no-multihop-mix branch's early 'return false' -- so it was only ever reached when gr_multihop_peer_exists was true. The no-multihop-mix branch instead returned false on the first peer that lacked EOR_RECEIVED -- including peers that have no activated AF at all and are therefore physically incapable of ever sending an EOR for the AFI/SAFI in question. In topologies where the BGP config defines neighbors that are never 'activate'd under any address family, this caused bgp_gr_check_all_eors() to return FALSE on every incoming EOR receipt, permanently blocking the GR fast-cancel path and forcing the deferral to always run to the select-defer-time safety-timer expiry. Move the !afc check above the branch split so both branches see it; the post-split copy becomes unreachable and is removed. Signed-off-by: Shashanka K S <shashankak@nvidia.com>
Greptile SummaryThis PR fixes a bug in
Confidence Score: 5/5The change is a surgical one-function fix that correctly moves an existing guard to an earlier position, with no new logic introduced and no regressions expected in the common case. The moved !afc[afi][safi] check was already present and correct in the multihop branch; this PR extends its reach to the no-multihop branch where it was absent. The debug log message, the continue path, and the surrounding peer-iteration logic are all unchanged. No new state is introduced and the removed code is provably dead after the move. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["for each peer in bgp->peer"] --> B{peer_is_config_node &&\nnot SHUTDOWN &&\nGRACEFUL_RESTART?}
B -- No --> A
B -- Yes --> C{PEER_STATUS_GR_WAIT_EOR\nset?}
C -- No --> A
C -- Yes --> D{PEER_STATUS_EOR_RECEIVED\nset?}
D -- Yes --> A
D -- No --> E{"!afc[afi][safi]?\n(peer not activated for AF)"}
E -- Yes: skip --> A
E -- No --> F{gr_multihop_peer_exists?}
F -- No --> G["return false\n(still waiting for EOR)"]
F -- Yes --> H{peer is multihop?}
H -- Yes --> I[mark eor_rcvd_from_all_mh_peers = false]
H -- No --> J["return false\n(direct peer, still waiting)"]
I --> A
A -- done --> K["return eor_rcvd_from_all_mh_peers"]
Reviews (1): Last reviewed commit: "bgpd: lift !afc check in bgp_gr_check_al..." | Re-trigger Greptile |
|
@Mergifyio backport stable/10.7 stable/10.6 stable/10.5 stable/10.4 stable/10.3 |
✅ Backports have been createdDetails
Cherry-pick of e6b40ba has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Cherry-pick of e6b40ba has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Cherry-pick of e6b40ba has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
bgpd: skip peers not activated for AFI/SAFI in bgp_gr_check_all_eors() (backport #22295)
bgpd: skip peers not activated for AFI/SAFI in bgp_gr_check_all_eors() (backport #22295)
bgp_gr_check_all_eors() walks every peer in bgp->peer and -- for any peer with PEER_STATUS_GR_WAIT_EOR set but no PEER_STATUS_EOR_RECEIVED -- splits into two code paths based on bgp->gr_multihop_peer_exists.
An existing !afc check filtered out peers that do not have this AFI/SAFI configured/activated, but it sat after the no-multihop-mix branch's early 'return false' -- so it was only ever reached when gr_multihop_peer_exists was true. The no-multihop-mix branch instead returned false on the first peer that lacked EOR_RECEIVED -- including peers that have no activated AF at all and are therefore physically incapable of ever sending an EOR for the AFI/SAFI in question.
In topologies where the BGP config defines neighbors that are never 'activate'd under any address family, this caused bgp_gr_check_all_eors() to return FALSE on every incoming EOR receipt, permanently blocking the GR fast-cancel path and forcing the deferral to always run to the select-defer-time safety-timer expiry.
Move the !afc check above the branch split so both branches see it; the post-split copy becomes unreachable and is removed.