Skip to content

[Efficiency Improver] perf(auth): eliminate per-IP heap allocation in RateLimiter#582

Draft
github-actions[bot] wants to merge 1 commit into
mainfrom
efficiency/ratelimit-visitor-value-map-6423517224fe3bb6
Draft

[Efficiency Improver] perf(auth): eliminate per-IP heap allocation in RateLimiter#582
github-actions[bot] wants to merge 1 commit into
mainfrom
efficiency/ratelimit-visitor-value-map-6423517224fe3bb6

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

🤖 This is a draft PR from Weekly Efficiency Improver, an automated AI assistant focused on reducing energy consumption.


Goal and Rationale

Every unique IP address processed by RateLimiter previously required a separate heap allocation for its visitor struct. At scale (e.g. 10 k req/s from diverse IPs), this meant thousands of small allocations per second, increasing GC frequency and CPU energy draw.

Focus area: Code-Level Efficiency — Memory allocation reduction


Approach

Changed visitors map[string]*visitorvisitors map[string]visitor so visitor state is stored inline in the map bucket, eliminating the per-entry pointer indirection and the corresponding heap allocation.

The denial path was also simplified: a single rl.visitors[key] = v write-back captures both the lastSeen update and the conditional token decrement, removing two separate return points.

New benchmarks are added to quantify the improvement and guard against future regressions:

go test -bench=BenchmarkRateLimiterAllow -benchmem ./auth/

Energy Efficiency Evidence

Proxy metric used: Heap allocations per operation (direct proxy for GC CPU overhead and DRAM energy).

Scenario Before After
Existing key (hot path) 0 allocs/op 0 allocs/op
New visitor entry 1 alloc/op (32-byte *visitor on heap) 0 allocs/op (inline in map bucket)

Baseline estimated from code inspection; CI benchmarks (-benchmem) will produce exact numbers.

Why this maps to energy: Each heap allocation requires the GC to track the pointer, scan it at collection time, and eventually free it. Eliminating N allocations/second removes that GC overhead proportional to N, reducing idle CPU cycles — directly lowering power draw per request.

Memory layout improvement (cache locality): With the pointer map, accessing a visitor's fields required chasing a pointer to a separate heap object (potential cache miss). With the value map, visitor fields are co-located with the map bucket (likely already hot in L1/L2 cache).


Green Software Foundation Context

  • Hardware Efficiency: Using inline value storage makes better use of CPU caches — visitor data is co-located with the map bucket rather than scattered across the heap.
  • Energy Proportionality: Fewer allocations means GC runs less frequently, so CPU energy draw is more proportional to actual request load rather than GC overhead.

Trade-offs

  • Write-back required: With a value map, modifications to a visitor must be explicitly written back (rl.visitors[key] = v). The existing mutex already serialises all access, so this adds no correctness risk.
  • Slightly larger map buckets: Each map bucket stores 32-byte visitor values instead of 8-byte pointers. For the default max of 10,000 visitors, total map memory actually decreases from ~400 KB (10k × 40 bytes: 8-byte pointer + 32-byte heap struct) to ~320 KB (10k × 32 bytes inline).
  • No API change: visitor is unexported; all public types and interfaces are unchanged.

Test Status

Tests verified structurally correct by code inspection (logic unchanged). CI will run the full test suite with Go 1.26.1.

Note: the local runner environment has Go 1.25.11, which is older than the go.mod requirement (1.26.1), so tests cannot be run locally — CI is the authoritative test gate.


Reproducibility

# After merging, compare before/after with git:
git stash   # or checkout main
go test -bench=BenchmarkRateLimiterAllow_newVisitors -benchmem -count=5 ./auth/

git stash pop  # or checkout this branch
go test -bench=BenchmarkRateLimiterAllow_newVisitors -benchmem -count=5 ./auth/

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

Generated by Weekly Efficiency Improver · 1.8K AIC · ⌖ 21.8 AIC · ⊞ 41K ·

Add this agentic workflows to your repo

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/daily-efficiency-improver.md@96b9d4c39aa22359c0b38265927eadb31dcf4e2a

Replace map[string]*visitor with map[string]visitor to eliminate
one heap allocation per unique IP address tracked by the rate limiter.

Before: each new visitor entry allocated a *visitor on the heap
(32-byte struct + 8-byte GC-tracked pointer in the map).

After: visitor state is stored inline in the map bucket. New visitor
entries no longer require a separate heap allocation, reducing GC
pressure proportionally to the number of unique IPs tracked.

The denial path is simplified: a single write-back captures both
the lastSeen update and the conditional token decrement, removing
the need for two separate control-flow return points.

Benchmarks to verify improvement:
  go test -bench=BenchmarkRateLimiterAllow -benchmem ./auth/

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants