Conversation
Adds ASPA (Autonomous System Provider Authorization) route leak
detection to the rpki stage, alongside the existing ROV validation.
Also replaces the stayrtr dependency with the new bgpfix/rtr client.
## ASPA algorithm (stages/rpki/aspa.go, new file)
Implements draft-ietf-sidrops-aspa-verification-24 §5 (upstream) and
§6 (downstream) valley-free path verification:
- aspAuthorized(aspa, CAS, PAS): checks if PAS is in CAS's provider
list. Returns aspNoAttestation / aspProvider / aspNotProvider.
- aspVerify(aspa, path, downstream): verifies flat AS_PATH.
- Upstream (from customer/peer/RS-client): every hop must go up;
any NotProvider → invalid, any NoAttestation → unknown.
- Downstream (from provider/RS): computes up-ramp from origin and
down-ramp from peer. max counts Provider+ and No-Attestation;
min counts only leading Provider+ hops. Valley-free iff the
ramps cover all N-1 pairs.
- aspIsDownstream(peerRole): returns true for ROLE_PROVIDER and
ROLE_RS. RS is treated like a provider per draft §6.3, which is
critical for correct behavior with RIS/RouteViews data sources.
- aspPeerRole: reads BGP Role capability from the peer's OPEN message
via pipe.LineFor(dir).
## ASPA integration (stages/rpki/validate.go)
validateAspa() is called after ROV for every UPDATE with reachable
prefixes. It:
1. Resolves the peer's BGP role once (sync.Once) from --role flag or
BGP Role capability. If --role auto and no capability: skips ASPA
for the entire session with a warning.
2. Calls u.AsPath().Flat() → nil triggers aspa_invalid (AS_SET present
or empty path, per spec).
3. Applies the configured aspa_action: keep / drop / withdraw+filter /
split (matching ROV action semantics).
4. Tags the message with aspa/status=VALID|UNKNOWN|INVALID when
--aspa-tag is set (default true).
5. Emits the configured event on ASPA INVALID (--aspa-event).
6. Early-returns for withdrawal-only UPDATEs (no AS_PATH to validate).
drainReachable() helper extracts all IPv4 and MP reachable prefixes
from an UPDATE, clearing them in place (used for withdraw/split actions).
## RPKI stage config (stages/rpki/rpki.go)
New ASPA constants: aspa_valid=0, aspa_unknown=1, aspa_invalid=2.
New struct fields:
- aspa atomic.Pointer[ASPA] — lock-free ASPA cache (same pattern as ROA)
- nextAspa ASPA — pending ASPA updates (copy-on-write, serialized)
- aspa_action int — action on ASPA INVALID
- aspa_tag bool — whether to tag messages with aspa/status
- aspa_event string — event name to emit on ASPA INVALID
- peer_role int — resolved peer role (-1=auto-not-found, ≥0=resolved)
- peer_role_mu sync.Once — ensures role resolved exactly once
- peer_role_ok bool — true when a usable role was found
- peer_downstream bool — cached aspIsDownstream result
New CLI flags:
- --aspa-invalid (default: keep) — action for ASPA INVALID paths
- --aspa-tag (default: true) — add aspa/status to message tags
- --aspa-event — emit named event on ASPA INVALID
- --role (default: auto) — peer BGP role for ASPA verification;
auto detects from BGP Role capability (RFC 9234 cap code 9)
New Prometheus metrics:
- bgpipe_rpki_aspa_valid_total
- bgpipe_rpki_aspa_unknown_total
- bgpipe_rpki_aspa_invalid_total
- bgpipe_rpki_aspa_entries gauge (ASPA records in current cache)
## ASPA cache management (stages/rpki/next.go)
nextFlush() and nextApply() extended to handle ASPA alongside ROA.
ASPA cache is published atomically in the same nextApply() call,
so roa_done already covers ASPA readiness.
nextAspaEntry(add, cas, providers): adds or removes a single ASPA
record from the pending cache.
Fixed IPv4 maxLen validation: was incorrectly checking ml > 128 for
IPv4 (should be ml > 32).
## ASPA from JSON file (stages/rpki/file.go)
fileParseJSON() now reads aspas[] alongside roas[]:
{"aspas": [{"customer_asid": 65001, "provider_asids": [65002, 65003]}]}
Routinator-compatible format. Enables offline testing and use with
rpki-client / Routinator JSON exports.
## RTR client rewrite (stages/rpki/rtr.go)
Replaced github.com/bgp/stayrtr with the new bgpfix/rtr.Client.
Callbacks (OnROA, OnASPA, OnEndOfData, OnCacheReset) are called
serially from the Run goroutine, so no locking needed inside them.
The retry/reconnect loop and dial logic are unchanged.
## Docs (docs/stages/rpki.md)
Full rewrite of the ASPA section:
- VALID/UNKNOWN/INVALID semantics and policy table
- RTR v2 requirement (v1 fallback supported, no ASPA)
- --role flag: auto-detection, explicit override, RS/RS-client cases
- filter vs withdraw equivalence for ASPA (entire path is suspect)
- New examples: explicit role, tag-only mode, RIS live validation
## Dependency changes (go.mod)
Removed: github.com/bgp/stayrtr v0.6.3 (and transitive golang.org/x/sync)
Requires: bgpfix feat-aspa branch for Aspath.Flat() and rtr/ package
There was a problem hiding this comment.
Pull request overview
Adds ASPA (RFC 9234 BGP Role–aware) AS_PATH validation to the existing rpki stage (which already performs ROV), and switches the RTR implementation from stayrtr to the bgpfix/rtr client to support RTR v2 ASPA data and related callbacks.
Changes:
- Implement ASPA verification logic and integrate it into UPDATE validation and policy actions.
- Extend cache management and file parsing to ingest/publish ASPA records alongside ROAs; add ASPA metrics and flags.
- Replace
stayrtrsession handling withbgpfix/rtrclient callbacks and refresh logic.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| stages/rpki/validate.go | Integrates ASPA validation and adds reachable-prefix draining helper for ASPA actions |
| stages/rpki/aspa.go | New ASPA verification + peer-role detection helpers |
| stages/rpki/rpki.go | Adds ASPA config/metrics/cache fields and CLI flags |
| stages/rpki/next.go | Publishes ASPA cache with ROAs; adds nextAspaEntry; fixes IPv4 maxLen cap |
| stages/rpki/file.go | Extends Routinator JSON parsing to include aspas[] |
| stages/rpki/rtr.go | Replaces stayrtr with bgpfix/rtr client callbacks and refresh |
| docs/stages/rpki.md | Documents ASPA semantics, options, and examples |
| go.mod / go.sum | Removes stayrtr and adjusts module metadata |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
stages/rpki/aspa.go
Outdated
| // NB: does not check that path[0] equals the neighbor AS (draft §5.4 step 2 / §5.5 step 2). | ||
| // That check must be done by the caller using the peer's ASN from the OPEN message. | ||
| // It is skipped when the peer is an RS (RS doesn't prepend its own ASN per RFC 7947). | ||
| func aspVerify(aspa ASPA, path []uint32, downstream bool) int { |
There was a problem hiding this comment.
aspVerify explicitly notes it does not enforce the draft-required check that path[0] equals the neighbor ASN, but validateAspa() never performs that check either. Without it, an UPDATE with a forged/atypical first AS hop can bypass correct ASPA verification. Implement the neighbor-ASN check in validateAspa() (skipping only for RS per RFC 7947) using the peer ASN from the OPEN message, or move the check into aspVerify with an optional peerASN/skip flag.
| require ( | ||
| github.com/VictoriaMetrics/metrics v1.41.2 | ||
| github.com/bgp/stayrtr v0.6.3 | ||
| github.com/bgpfix/bgpfix v0.18.0 |
There was a problem hiding this comment.
The PR now imports github.com/bgpfix/bgpfix/rtr and uses AsPath().Flat(), but go.mod still pins github.com/bgpfix/bgpfix v0.18.0. That version does not contain the rtr package / Flat() API, so the build will fail unless the module version is bumped or a replace/pseudo-version is added to point at the feat-aspa branch/commit.
| github.com/bgpfix/bgpfix v0.18.0 | |
| github.com/bgpfix/bgpfix v0.19.0 |
stages/rpki/validate.go
Outdated
| // validateAspa performs ASPA path validation for the UPDATE message. | ||
| // Returns (false, nil) to drop the message, (true, nil) to keep it, or (false, err) on fatal error. | ||
| func (s *Rpki) validateAspa(m *msg.Msg, u *msg.Update, tags map[string]string) (bool, error) { | ||
| aspa := s.aspa.Load() | ||
| if aspa == nil || len(*aspa) == 0 { | ||
| return true, nil // no ASPA data, skip | ||
| } |
There was a problem hiding this comment.
New ASPA logic (aspVerify/validateAspa + policy actions/tagging) isn’t covered by tests even though this package has an existing test suite. Adding unit tests for aspVerify (upstream/downstream + UNKNOWN/INVALID edge cases) and tests for validateAspa actions would help prevent regressions.
stages/rpki/validate.go
Outdated
| case rpki_split: | ||
| invalid_prefixes := drainReachable(u) | ||
| m.Edit() | ||
| if len(invalid_prefixes) > 0 && s.in_split != nil { | ||
| m2 := s.P.GetMsg().Switch(msg.UPDATE) | ||
| m2.Time = m.Time | ||
| m2.Update.AddUnreach(invalid_prefixes...) | ||
| s.in_split.WriteMsg(m2) | ||
| } |
There was a problem hiding this comment.
In the ASPA rpki_split branch, drainReachable() clears reachable NLRI from the original UPDATE, but the code neither adds those prefixes back as withdrawals on the original nor drops the now-empty UPDATE. This can result in forwarding an empty UPDATE and/or not actually withdrawing the invalid announcements when in_split is nil. Consider either (a) dropping the original after emitting the withdrawal message, or (b) keeping the original’s reachable NLRI and only emitting the withdrawal on the side channel (consistent with the intended split semantics).
stages/rpki/validate.go
Outdated
| // ASPA validation (independent of ROV result) | ||
| keep, err := s.validateAspa(m, u, tags) | ||
| if err != nil { | ||
| s.Fatal().Err(err).Msg("ASPA role error") | ||
| return false |
There was a problem hiding this comment.
The comment says ASPA validation is independent of the ROV result, but validateMsg returns early for several ROV policies (e.g., --invalid=keep/withdraw/filter/drop), so validateAspa() is skipped whenever any ROV-invalid prefixes exist. If ASPA should run for every UPDATE with reachable NLRI, refactor the control flow so ASPA is evaluated after the ROV action handling (except when the message is definitively dropped) rather than returning before this step.
stages/rpki/validate.go
Outdated
| roleName := s.role_name | ||
| if roleName != "auto" { | ||
| role, ok := parseRoleName(roleName) | ||
| if !ok { | ||
| resolveErr = fmt.Errorf("unknown --role value: %q", roleName) |
There was a problem hiding this comment.
Role parsing/validation is deferred until the first UPDATE and the comparison to "auto" is case-sensitive. This means an invalid --role value (or e.g. "AUTO") won’t be caught until runtime and can currently trigger a Fatal. Consider normalizing/validating --role during Attach() and storing the parsed role so misconfiguration fails fast at startup.
docs/stages/rpki.md
Outdated
| in their OPEN message, ASPA validation is silently skipped for the entire session. | ||
| Use `--role` to force ASPA validation when the peer lacks this capability. |
There was a problem hiding this comment.
Docs say ASPA validation is “silently skipped” when --role auto and the peer lacks the BGP Role capability, but the implementation logs a warning when this happens. Please align the documentation with the actual behavior (or adjust the code if silence is desired).
| in their OPEN message, ASPA validation is silently skipped for the entire session. | |
| Use `--role` to force ASPA validation when the peer lacks this capability. | |
| in their OPEN message, ASPA validation is skipped for the entire session and a | |
| warning is logged. Use `--role` to force ASPA validation when the peer lacks this | |
| capability. |
| if ml := int(maxLen); ml < prefix.Bits() || ml > maxBits { | ||
| s.Warn().Str("prefix", prefix.String()).Int("maxLength", ml).Msg("invalid MaxLength, skipping") | ||
| return |
There was a problem hiding this comment.
This change fixes MaxLength validation to correctly cap IPv4 at 32 bits. Please add a regression test (e.g., maxLen=33 for an IPv4 prefix should be rejected) since nextRoa already has a dedicated test file and this was a previously incorrect boundary condition.
stages/rpki/rpki.go
Outdated
| s.P.OnMsg(s.validateMsg, s.Dir, msg.UPDATE) | ||
|
|
||
| // create input for --invalid=split | ||
| if s.invalid == rpki_split { |
There was a problem hiding this comment.
--aspa-invalid supports "split", but the split output pipe (in_split) is only created when --invalid=split. If a user sets --aspa-invalid=split with a different --invalid policy, ASPA split will silently not emit the withdrawal message. Consider creating in_split when either invalid==split OR aspa_action==split.
| if s.invalid == rpki_split { | |
| if s.invalid == rpki_split || s.aspa_action == rpki_split { |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if do_split { | ||
| s.split.WriteMsg(m2) | ||
| } | ||
|
|
There was a problem hiding this comment.
When --invalid=split and tagging is enabled, the original UPDATE (with only VALID/NOT_FOUND reachable prefixes after deletion) never gets an overall "rpki/status" tag because the status assignment only runs in the len(invalid)==0 branch. This is a behavior change from the previous implementation and breaks downstream filters that rely on rpki/status for the forwarded (non-withdraw) UPDATE. Consider setting rpki/status on the original message after splitting based on the remaining prefixes (VALID vs NOT_FOUND), similar to the pre-refactor behavior.
| // When splitting invalid prefixes into a separate UPDATE, also tag the | |
| // original message with an overall rpki/status based on the remaining | |
| // reachable prefixes (VALID vs NOT_FOUND), to match pre-refactor behavior. | |
| if do_split && s.tag { | |
| switch { | |
| case len(valid) > 0: | |
| tags["rpki/status"] = "VALID" | |
| m.Edit() | |
| case len(not_found) > 0: | |
| tags["rpki/status"] = "NOT_FOUND" | |
| m.Edit() | |
| } | |
| } |
| // for dev: use the latest code in ../bgpfix | ||
| //replace github.com/bgpfix/bgpfix => ../bgpfix | ||
| replace github.com/bgpfix/bgpfix => ../bgpfix |
There was a problem hiding this comment.
The committed replace directive forces builds to use a local ../bgpfix checkout, which will break CI and downstream consumers that don't have that path. This also explains the go.sum churn (missing sums for the required module). Please remove/comment out the replace and depend on a published bgpfix version (or a proper VCS pseudo-version) so module resolution is reproducible.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
go.mod:48
- The committed
replace github.com/bgpfix/bgpfix => ../bgpfixmakes builds non-reproducible and will fail in CI / for other developers who don’t have that local path. Please remove the replace directive (or keep it commented) and instead bump the required bgpfix version / use a proper pseudo-version pointing at the needed commit, then regenerate go.sum accordingly.
// for dev: use the latest code in ../bgpfix
replace github.com/bgpfix/bgpfix => ../bgpfix
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| if len(reach) > 0 { | ||
| u.AddUnreach(reach...) | ||
| } |
There was a problem hiding this comment.
When --aspa-invalid withdraw triggers, this code moves all reachable prefixes to withdrawals but does not remove path attributes / MP_REACH_NLRI. That can produce an UPDATE that is effectively withdrawal-only yet still carries attributes (and potentially an empty MP_REACH), which is non-compliant and can be rejected by peers. After converting to withdrawals, clear MP_REACH and filter the attribute set down to MP_UNREACH only (similar to the ROV withdraw handling).
| } | |
| } | |
| // After converting to withdrawals, clear MP_REACH and keep only MP_UNREACH | |
| // in the path attribute set so that this becomes a pure withdrawal UPDATE. | |
| if u.Attrs != nil { | |
| filtered := u.Attrs[:0] | |
| for _, a := range u.Attrs { | |
| switch a.(type) { | |
| case *msg.MpUnreachNLRI: | |
| // Keep MP_UNREACH_NLRI attributes. | |
| filtered = append(filtered, a) | |
| // Drop MP_REACH_NLRI and all other attributes to avoid | |
| // sending a non-compliant withdrawal-only UPDATE that | |
| // still carries path attributes. | |
| } | |
| } | |
| u.Attrs = filtered | |
| } |
| OnEndOfData: func(sessid uint16, serial uint32) { | ||
| s.nextApply() | ||
| s.rtr_mu.Lock() | ||
| s.rtr_valid = true | ||
| s.rtr_mu.Unlock() | ||
| }, |
There was a problem hiding this comment.
OnEndOfData receives (sessid, serial) but the implementation ignores them and always calls nextApply(). If the RTR client invokes OnEndOfData even when the serial hasn’t changed (e.g., periodic Serial Query with no diffs), this will unnecessarily clone large VRP/ASPA maps and spam logs. Consider storing the last applied serial/session in the stage state and only applying when it changes.
| for _, aspa := range doc.ASPAs { | ||
| if aspa.CustomerASID == 0 { | ||
| s.Warn().Msg("ASPA entry with zero customer ASN, skipping") | ||
| continue | ||
| } | ||
| s.nextASPA(true, aspa.CustomerASID, aspa.ProviderASIDs) | ||
| } |
There was a problem hiding this comment.
ASPA records loaded from JSON are accepted without validating/normalizing provider ASNs (e.g., 0 values, duplicates, or unsorted lists). It would be safer and more deterministic to (a) drop provider_asids==0, (b) de-duplicate, and (c) sort the provider list before storing, so authorization checks are consistent and can use faster lookups.
| for _, p := range provs { | ||
| if p == pas { | ||
| return asp_prov | ||
| } |
There was a problem hiding this comment.
aspAuthorized does a linear scan over the provider list for every hop validation. With large ASPA datasets, this can become a hot path. If provider lists are stored sorted/unique, you can replace the loop with slices.BinarySearch (or store providers in a map[uint32]struct{}) to make lookups O(log n) / O(1).
| for _, p := range provs { | |
| if p == pas { | |
| return asp_prov | |
| } | |
| _, found := slices.BinarySearch(provs, pas) | |
| if found { | |
| return asp_prov |
| keep, err := s.validateAspa(m, u, tags) | ||
| if err != nil { | ||
| s.Fatal().Err(err).Msg("ASPA role error") | ||
| return false | ||
| } |
There was a problem hiding this comment.
validateMsg treats a non-nil error from validateAspa as fatal, but validateAspa currently never returns a non-nil error. Either remove the error return path (and the fatal log) or introduce a real error condition that can occur at runtime so this branch is meaningful.
| keep, err := s.validateAspa(m, u, tags) | |
| if err != nil { | |
| s.Fatal().Err(err).Msg("ASPA role error") | |
| return false | |
| } | |
| keep, _ := s.validateAspa(m, u, tags) |
Adds ASPA (Autonomous System Provider Authorization) route leak detection to the rpki stage, alongside the existing ROV validation. Also replaces the stayrtr dependency with the new bgpfix/rtr client.
ASPA algorithm (stages/rpki/aspa.go, new file)
Implements draft-ietf-sidrops-aspa-verification-24 §5 (upstream) and §6 (downstream) valley-free path verification:
aspAuthorized(aspa, CAS, PAS): checks if PAS is in CAS's provider list. Returns aspNoAttestation / aspProvider / aspNotProvider.
aspVerify(aspa, path, downstream): verifies flat AS_PATH.
aspIsDownstream(peerRole): returns true for ROLE_PROVIDER and ROLE_RS. RS is treated like a provider per draft §6.3, which is critical for correct behavior with RIS/RouteViews data sources.
aspPeerRole: reads BGP Role capability from the peer's OPEN message via pipe.LineFor(dir).
ASPA integration (stages/rpki/validate.go)
validateAspa() is called after ROV for every UPDATE with reachable prefixes. It:
drainReachable() helper extracts all IPv4 and MP reachable prefixes from an UPDATE, clearing them in place (used for withdraw/split actions).
RPKI stage config (stages/rpki/rpki.go)
New ASPA constants: aspa_valid=0, aspa_unknown=1, aspa_invalid=2.
New struct fields:
New CLI flags:
New Prometheus metrics:
ASPA cache management (stages/rpki/next.go)
nextFlush() and nextApply() extended to handle ASPA alongside ROA. ASPA cache is published atomically in the same nextApply() call, so roa_done already covers ASPA readiness.
nextAspaEntry(add, cas, providers): adds or removes a single ASPA record from the pending cache.
Fixed IPv4 maxLen validation: was incorrectly checking ml > 128 for IPv4 (should be ml > 32).
ASPA from JSON file (stages/rpki/file.go)
fileParseJSON() now reads aspas[] alongside roas[]:
{"aspas": [{"customer_asid": 65001, "provider_asids": [65002, 65003]}]}
Routinator-compatible format. Enables offline testing and use with rpki-client / Routinator JSON exports.
RTR client rewrite (stages/rpki/rtr.go)
Replaced github.com/bgp/stayrtr with the new bgpfix/rtr.Client. Callbacks (OnROA, OnASPA, OnEndOfData, OnCacheReset) are called serially from the Run goroutine, so no locking needed inside them. The retry/reconnect loop and dial logic are unchanged.
Docs (docs/stages/rpki.md)
Full rewrite of the ASPA section:
Dependency changes (go.mod)
Removed: github.com/bgp/stayrtr v0.6.3 (and transitive golang.org/x/sync)
Requires: bgpfix feat-aspa branch for Aspath.Flat() and rtr/ package