Skip to content

geolocation: add ICMP pinger to geoprobe-agent#3427

Open
ben-dz wants to merge 30 commits intomainfrom
bdz/geolocation-agent-icmp
Open

geolocation: add ICMP pinger to geoprobe-agent#3427
ben-dz wants to merge 30 commits intomainfrom
bdz/geolocation-agent-icmp

Conversation

@ben-dz
Copy link
Copy Markdown
Contributor

@ben-dz ben-dz commented Mar 31, 2026

Summary of Changes

  • Add an ICMP echo pinger (ICMPPinger) to geoprobe-agent that measures outbound ICMP targets using raw sockets with kernel timestamps (SO_TIMESTAMPNS), interleaved batch send/receive, and epoll-based I/O
  • Integrate ICMP targets into the existing measurement loop and target discovery pipeline, discovering OutboundIcmp targets from onchain data alongside TWAMP targets
  • Refactor the measurement loop from a standalone function to a measurementLoop struct with a shared reconcileTargets helper that deduplicates the add/remove/measure-new-targets logic for both TWAMP and ICMP channels

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 4 +620 / -104 +516
Tests 4 +737 / -21 +716
Metrics 1 +33 / -15 +18
Config/build 1 +1 / -0 +1
Docs 1 +2 / -0 +2

Most of the new code is the ICMP pinger and its tests; the measurement loop refactor is net-neutral.

Key files (click to expand)

Testing Verification

  • Mock-based unit tests for ICMPPinger cover add/remove, duplicate handling, MeasureOne (success, timeout, mismatched seq/ID), MeasureAll (multi-target, empty, context cancellation), and Close
  • Integration tests (gated on CAP_NET_RAW) verify real ICMP echo round-trips to localhost
  • icmpConn tests cover raw socket round-trip, deadline expiry, IPv6 rejection, and IPv4 header stripping
  • ValidateICMP and ParseICMPProbeAddresses tested including IPv6 rejection, deduplication, and invalid input
  • Target discovery tests cover ICMP-only, mixed TWAMP+ICMP, and private-IP rejection for ICMP targets

@ben-dz ben-dz added this to the Geo Location milestone Mar 31, 2026
@ben-dz ben-dz force-pushed the bdz/geolocation-agent-icmp branch 2 times, most recently from 90b0846 to af75fcf Compare April 2, 2026 15:06
@ben-dz ben-dz marked this pull request as ready for review April 2, 2026 15:27
@ben-dz ben-dz requested review from nikw9944 and vihu April 2, 2026 15:27
ben-dz added 24 commits April 2, 2026 11:35
Extract icmpSocket interface from icmpConn to enable mock-based testing
without CAP_NET_RAW. Replace unsafe.Pointer timestamp parsing with
encoding/binary. Increase epoll events array from 1 to 16 to reduce
syscall overhead in batch receive.
Fail fast at validation time instead of silently failing at send time.
The ICMP socket is AF_INET (IPv4-only), so IPv6 addresses would be
added to the probe map but every measurement would error.
ValidateICMP now guarantees IPv4, so ParseIP cannot return nil.
Use .To4() directly.
Rewrite icmp_pinger_test.go to use a mockICMPSocket that implements
the icmpSocket interface. Tests cover add/remove, duplicate handling,
MeasureOne (success, timeout, wrong seq/ID), MeasureAll (multi-probe,
empty, context cancellation), and Close. These run in CI without
CAP_NET_RAW. Integration tests retained at the bottom, gated on
CAP_NET_RAW.
Follow Go convention of all-caps for acronyms, consistent with
ICMPPinger and ICMPPingerConfig.
ben-dz added 3 commits April 2, 2026 11:35
…helper

Replace 15-parameter runMeasurementLoop with a measurementLoop struct.
Extract reconcileTargets helper to deduplicate the identical TWAMP and
ICMP target reconciliation logic.
Prevent Close from closing the fd while a measurement is in-flight.
Without this, a concurrent MeasureOne or MeasureAll blocked in
epoll_wait could have its fd closed underneath it.
Callers must serialize access; ICMPPinger does this via measMu.
@ben-dz ben-dz force-pushed the bdz/geolocation-agent-icmp branch from af75fcf to 05ba5e3 Compare April 2, 2026 15:35
@nikw9944 nikw9944 linked an issue Apr 2, 2026 that may be closed by this pull request
@nikw9944
Copy link
Copy Markdown
Contributor

nikw9944 commented Apr 2, 2026

Code review

Found 1 issue:

  1. NewICMPPinger is called unconditionally at startup even when no ICMP targets are configured. It opens a raw ICMP socket requiring CAP_NET_RAW, so existing deployments that don't use ICMP and don't grant this capability will fail at startup with os.Exit(1). The pinger should only be created when ICMP targets are actually present.

// Set up ICMP pinger for outbound ICMP targets.
icmpPinger, err := geoprobe.NewICMPPinger(&geoprobe.ICMPPingerConfig{
Logger: log,
ProbeTimeout: *twampSenderTimeout,
BatchSize: geoprobe.ICMPDefaultBatchSize,
StaggerDelay: geoprobe.ICMPDefaultStaggerDelay,
})
if err != nil {
log.Error("Failed to create ICMP pinger (CAP_NET_RAW may be missing)", "error", err)
os.Exit(1)
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@nikw9944
Copy link
Copy Markdown
Contributor

nikw9944 commented Apr 2, 2026

Minor: ParseICMPProbeAddresses deduplicates by addr.String() (host:port:0), but ICMPPinger.AddProbe keys by addr.Host only. If the same host appears with different offset ports (e.g. 8.8.8.8:9000,8.8.8.8:9001), the parser accepts both but the pinger silently drops the second. Likely an edge case in practice, but the key mismatch could cause confusion.

probes: make(map[string]*icmpProbeEntry),
cfg: cfg,
id: os.Getpid() & 0xffff,
log: cfg.Logger,
}, nil
}

return fmt.Errorf("host %s is not a public unicast address", p.Host)
}
return nil
}
// ParseICMPProbeAddresses parses a comma-separated list of ICMP probe addresses.
// Each entry must be host:offset_port. TWAMPPort is always 0 for ICMP targets.
func ParseICMPProbeAddresses(s string) ([]ProbeAddress, error) {
if s == "" {
return nil, nil
}
parts := strings.Split(s, ",")
probes := make([]ProbeAddress, 0, len(parts))
seen := make(map[string]bool)

@nikw9944
Copy link
Copy Markdown
Contributor

nikw9944 commented Apr 2, 2026

Minor: In discoverAndSend, the skip-scan guard checks targets == nil && inboundKeys == nil but doesn't check icmpTargets == nil. This works today since discover() returns all-nil on skip, but the guard is fragile — if discover() is later refactored to return non-nil TWAMP targets but nil ICMP targets, the guard won't catch it. Consider adding icmpTargets == nil to the condition for consistency.

if targets == nil && inboundKeys == nil {
return
}

@nikw9944
Copy link
Copy Markdown
Contributor

nikw9944 commented Apr 2, 2026

I think this can lead to bad offset data. Claude:

uint64(rxTime.Sub(pp.txTime).Nanoseconds()) in readReplies and MeasureOne wraps to ~MaxUint64 (~18 exaseconds) if clock skew or an anomalous kernel timestamp produces a negative duration. The existing TWAMP pinger guards against this with decideRTT which clamps to max(rtt, 0) before casting — the ICMP path has no equivalent.

The impact cascades: the bogus MeasuredRttNs overflows RttNs in sendCompositeOffsets to a small positive value, which then wins offsetCache.GetBest comparisons (evicting legitimate offsets), gets Ed25519-signed, and is embedded in every outbound Signed TWAMP reply — propagating bad data to downstream targets until the cache entry expires.

rtt := uint64(rxTime.Sub(pp.txTime).Nanoseconds())
results[pp.addr] = rtt

rtt := uint64(rxTime.Sub(txTime).Nanoseconds())
p.log.Debug("MeasureOne succeeded", "host", addr.Host, "rtt_ns", rtt)

Copy link
Copy Markdown
Contributor

@nikw9944 nikw9944 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see PR comments

@ben-dz
Copy link
Copy Markdown
Contributor Author

ben-dz commented Apr 2, 2026

Code review

Found 1 issue:

  1. NewICMPPinger is called unconditionally at startup even when no ICMP targets are configured. It opens a raw ICMP socket requiring CAP_NET_RAW, so existing deployments that don't use ICMP and don't grant this capability will fail at startup with os.Exit(1). The pinger should only be created when ICMP targets are actually present.

This is on purpose. Failing later is worse than failing earlier. It should just fail out so we fix the permissions.

ben-dz added 3 commits April 2, 2026 16:24
The pinger keys probes by host only, so two entries for the same host
with different offset ports would silently drop the second. Align the
parser to match by deduplicating on host instead of host:port:0.
Check icmpTargets == nil alongside targets and inboundKeys when
deciding whether a discovery scan was skipped.
Matches the TWAMP pinger's decideRTT pattern. Without the clamp, clock
skew or anomalous kernel timestamps could produce a negative duration
that wraps to ~MaxUint64 when cast to uint64, corrupting offset data.
@ben-dz ben-dz requested a review from nikw9944 April 2, 2026 20:28
@ben-dz
Copy link
Copy Markdown
Contributor Author

ben-dz commented Apr 2, 2026

@nikw9944 : Addressed 2, 3, 4. 1 was intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Geolocation: Add ICMP ping support to geoprobe-agent

2 participants