Replace KV-based rate limiting with Cloudflare's built-in Rate Limiting API

## Problem

Currently using Workers KV for both minute and hour rate limiting (workers/osa-worker/index.js:71-98), which has issues:

1. **KV writes for bot protection**: Each request = 2 KV writes (minute + hour keys)
   - Adds ~10-50ms latency for bot checks on every request
   - On free plan: Would hit 1,000 writes/day limit in ~8 hours
   - On Pro plan: Unlimited but still adds latency

2. **Purpose mismatch**:
   - **Per-minute limit (bot prevention)**: Needs to be fast, per-request
   - **Per-hour limit (human abuse prevention)**: Can tolerate latency, needs global consistency

## Solution: Hybrid Approach

Use **built-in Rate Limiting API for per-minute** + **KV for per-hour**:

### Benefits
- ✅ **50% reduction in KV writes**: 1 write/request (hour) instead of 2 (minute + hour)
- ✅ **Faster bot protection**: <1ms (built-in API) vs ~10-50ms (KV) for critical first check
- ✅ **Global hourly limits**: KV provides consistency across all edge locations
- ✅ **Pro Plan friendly**: 1 write/request is totally fine on Pro Plan (unlimited)
- ✅ **Best of both**: Speed where it matters (bots), global consistency where it matters (humans)

### Why Hybrid vs Full Built-in API?

**Built-in API limitations:**
- Only supports `period: 10` or `period: 60` seconds (cannot do hourly)
- Per-location enforcement (not global)
- Static configuration in wrangler.toml (cannot vary dev/prod dynamically)

**KV advantages for hourly:**
- Supports arbitrary time windows (3600s for hourly)
- Global consistency across all Cloudflare locations
- Dynamic limits based on environment

## Technical Implementation

### Per-Minute (Built-in API)
```toml
# wrangler.toml
[[ratelimits]]
name = "RATE_LIMITER_MINUTE"
namespace_id = "1001"
simple = { limit = 10, period = 60 }

[[env.dev.ratelimits]]
name = "RATE_LIMITER_MINUTE"
namespace_id = "1002"
simple = { limit = 60, period = 60 }
```

```javascript
// index.js - Fast bot check
const { success } = await env.RATE_LIMITER_MINUTE.limit({ key: ip });
if (!success) {
  return { allowed: false, reason: 'Too many requests per minute' };
}
```

### Per-Hour (KV)
```javascript
// index.js - Global human abuse check
const hourKey = `rl:hour:${ip}:${Math.floor(now / 3600)}`;
const hourCount = parseInt(await env.RATE_LIMITER_KV.get(hourKey) || '0');
if (hourCount >= CONFIG.RATE_LIMIT_PER_HOUR) {
  return { allowed: false, reason: 'Too many requests per hour' };
}
await env.RATE_LIMITER_KV.put(hourKey, (hourCount + 1).toString(), { expirationTtl: 7200 });
```

## Implementation

**Files to change:**

1. **wrangler.toml**:
   - Add `[[ratelimits]]` for per-minute (built-in API)
   - Keep `[[kv_namespaces]]` for per-hour (KV)
   - Rename KV binding to `RATE_LIMITER_KV` for clarity

2. **index.js**:
   - Check built-in API first (per-minute, fast)
   - Then check KV (per-hour, global)
   - Only 1 KV write instead of 2

3. **README.md**:
   - Document hybrid approach
   - Explain rationale (bot vs human protection)

## Performance Comparison

| Metric | Old (KV only) | New (Hybrid) |
|--------|--------------|--------------|
| **KV writes/request** | 2 | 1 |
| **Bot check latency** | ~10-50ms | <1ms |
| **Hourly limit scope** | Global ✓ | Global ✓ |
| **Writes/hour @ 10 req/min** | 1,200 | 600 |
| **Pro Plan cost** | Unlimited | Unlimited |

## Migration Steps

1. Update wrangler.toml (add ratelimits, keep kv_namespaces)
2. Update checkRateLimit() in index.js (hybrid implementation)
3. Deploy to dev: `wrangler deploy --env dev`
4. Test both limits work
5. Deploy to production: `wrangler deploy`

## Testing Plan

```bash
# Test per-minute limit (should hit after 10 requests in dev: 60/min)
for i in {1..15}; do 
  curl -X POST https://osa-worker-dev.yahyaqaraeen.workers.dev/hed/ask \
    -H "Content-Type: application/json" \
    -d '{"question":"test"}' \
    -w "\n%{http_code}\n"
  sleep 0.5
done

# Test per-hour limit (would need 61+ requests in 1 hour)
```

## References

- [Cloudflare Rate Limiting API docs](https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit/)
- Current implementation: workers/osa-worker/index.js:71-98
- Pro Plan: Unlimited KV writes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace KV-based rate limiting with Cloudflare's built-in Rate Limiting API #129

Problem

Solution: Hybrid Approach

Benefits

Why Hybrid vs Full Built-in API?

Technical Implementation

Per-Minute (Built-in API)

Per-Hour (KV)

Implementation

Performance Comparison

Migration Steps

Testing Plan

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Old (KV only)	New (Hybrid)
KV writes/request	2	1
Bot check latency	~10-50ms	<1ms
Hourly limit scope	Global ✓	Global ✓
Writes/hour @ 10 req/min	1,200	600
Pro Plan cost	Unlimited	Unlimited

Replace KV-based rate limiting with Cloudflare's built-in Rate Limiting API #129

Description

Problem

Solution: Hybrid Approach

Benefits

Why Hybrid vs Full Built-in API?

Technical Implementation

Per-Minute (Built-in API)

Per-Hour (KV)

Implementation

Performance Comparison

Migration Steps

Testing Plan

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions