Skip to content

Implement hybrid rate limiting: built-in API + KV#130

Merged
neuromechanist merged 5 commits into
developfrom
feature/issue-129-builtin-rate-limiting
Jan 29, 2026
Merged

Implement hybrid rate limiting: built-in API + KV#130
neuromechanist merged 5 commits into
developfrom
feature/issue-129-builtin-rate-limiting

Conversation

@neuromechanist

@neuromechanist neuromechanist commented Jan 29, 2026

Copy link
Copy Markdown
Member

Summary

Hybrid rate limiting approach: Built-in Rate Limiting API for per-minute (bot protection) + Workers KV for per-hour (human abuse prevention).

Problem with Full Built-in API Approach

Initial attempt to use only the built-in API had critical issues:

  1. Cannot do hourly limits - Only supports period: 10 or period: 60 seconds
  2. Static configuration - Cannot dynamically vary limits for dev vs prod
  3. Per-location enforcement - Not globally consistent across edge locations

Hybrid Solution

Use the best tool for each job:

Purpose Tool Why
Per-minute (bot protection) Built-in API Fast (<1ms), catches bots immediately
Per-hour (human abuse) KV Global consistency, arbitrary time windows

Benefits

  • 50% reduction in KV writes - 1 write/request (hour only) vs 2 (minute + hour)
  • Faster bot protection - <1ms (built-in API) vs ~10-50ms (KV) for critical first check
  • Global hourly limits - KV provides consistency across all Cloudflare edge locations
  • Pro Plan friendly - 1 write/request is unlimited on Pro Plan
  • Best of both worlds - Speed where it matters (bots), global where it matters (humans)

Rate Limits

Per IP address:

Environment Per Minute (Bot) Per Hour (Human)
Production 10 20 (~1 question/3 min)
Development 60 100

Scope: Limits are per IP address, not per session. 20/hour prevents abuse while allowing reasonable research sessions.

Changes

wrangler.toml:

  • Add [[ratelimits]] binding for per-minute (built-in API)
  • Keep [[kv_namespaces]] for per-hour (KV)
  • Separate namespace IDs for prod/dev rate limiters

index.js:

  • Check built-in API first (fast bot check)
  • Then check KV for hourly limit
  • Only 1 KV write per request instead of 2

README.md:

  • Document hybrid approach and rationale
  • Explain rate limit scope (per IP)
  • Update setup instructions

Implementation Details

Per-Minute (Built-in API):

# wrangler.toml
[[ratelimits]]
name = "RATE_LIMITER_MINUTE"
namespace_id = "1001"
simple = { limit = 10, period = 60 }
// Fast check, <1ms
const { success } = await env.RATE_LIMITER_MINUTE.limit({ key: ip });

Per-Hour (KV):

// Global consistency
const hourKey = `rl:hour:${ip}:${Math.floor(now / 3600)}`;
const hourCount = parseInt(await env.RATE_LIMITER_KV.get(hourKey) || '0');
if (hourCount >= CONFIG.RATE_LIMIT_PER_HOUR) {
  return { allowed: false };
}
await env.RATE_LIMITER_KV.put(hourKey, (hourCount + 1).toString(), { expirationTtl: 7200 });

Deployment

# Dev
cd workers/osa-worker
wrangler deploy --env dev

# Production
wrangler deploy

No additional setup needed - KV namespaces already exist, built-in API is automatic.

Testing

# Test per-minute limit (should hit after 10 requests)
for i in {1..15}; do 
  curl -s -X POST https://osa-worker-dev.yahyaqaraeen.workers.dev/hed/ask \
    -H "Content-Type: application/json" \
    -d '{"question":"test"}' \
    -w " - %{http_code}\n"
  sleep 0.5
done

Expected: First 10 fail Turnstile (403), then 429 (rate limit)

Performance Impact

Metric Old (KV only) New (Hybrid)
KV writes/request 2 1
Bot check latency ~10-50ms <1ms
Hourly limit scope Global ✓ Global ✓
Writes/hour @ 10 req/min 1,200 600

References

Closes #129

Switch from Workers KV to Cloudflare's built-in Rate Limiting API.

Benefits:
- Free and unlimited (no KV write limits)
- In-memory, faster (no network round-trips)
- Simpler code (30 lines to 15 lines)
- Same functionality (per-minute and per-hour limits)

Changes:
- wrangler.toml: Replace kv_namespaces with rate_limit binding
- index.js: Simplify checkRateLimit() to use env.RATE_LIMITER.limit()
- README.md: Remove KV setup instructions

Closes #129
Pivot from full built-in API to hybrid approach based on PR review.

Per-minute (bot protection):
- Built-in Rate Limiting API
- Fast (<1ms), in-memory
- Configured in wrangler.toml with namespace_id

Per-hour (human abuse prevention):
- Workers KV for global consistency
- Supports arbitrary time windows (3600s)
- 1 write per request instead of 2

Benefits:
- 50% reduction in KV writes (1 vs 2 per request)
- Faster bot protection (<1ms vs ~10-50ms)
- Global hourly limits across edge locations
- Works on Pro Plan (unlimited KV writes)

Changes:
- wrangler.toml: Add [[ratelimits]] + keep [[kv_namespaces]]
- index.js: Check built-in API first, then KV for hourly
- README.md: Document hybrid approach and rationale

Addresses review feedback on PR #130
Per IP address limits:
- Production: 20/hour (~1 question every 3 minutes)
- Development: 100/hour (for testing)

Rationale:
- Prevents sustained abuse
- Allows reasonable research sessions
- 20/hour is sufficient for legitimate use
Update CI/CD workflow to trigger on worker file changes, not just CORS.

Changes:
- Add workers/osa-worker/** to trigger paths
- Check for worker file changes in push event
- Deploy if CORS sync changed files OR worker files were pushed
- Only commit CORS changes (not worker changes, already committed)
- Update PR comment to reflect auto-deployment

Now merging to develop will automatically deploy worker changes.
@neuromechanist neuromechanist changed the title Replace KV-based rate limiting with built-in API Implement hybrid rate limiting: built-in API + KV Jan 29, 2026
Address PR review findings:

Critical fixes:
- Reorder checks: hourly (KV read) → per-minute (token) → hourly (KV write)
  Prevents wasting per-minute tokens on hourly-rejected requests
- Add radix parameter to parseInt() calls for safety

Important fixes:
- Use 'wrangler deploy' instead of 'wrangler deploy --env=""' for prod
- Document KV race condition as known limitation

This ensures per-minute tokens are only consumed by requests that
pass the hourly check, preventing token waste and clearer error messages.
@neuromechanist neuromechanist merged commit ad3d252 into develop Jan 29, 2026
6 checks passed
@neuromechanist neuromechanist deleted the feature/issue-129-builtin-rate-limiting branch January 29, 2026 07:39
@neuromechanist neuromechanist mentioned this pull request Feb 7, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant