Data-driven weighting for public lobby generation (collect fill-rate telemetry, then feed back into MapPlaylist)

## Summary

The three public lobbies (`ffa`, `team`, `special`) are currently generated from static, hard-coded weights in `src/server/MapPlaylist.ts` — the `frequency` table, `TEAM_WEIGHTS`, and the modifier ticket pool `SPECIAL_MODIFIER_POOL`. Every weight is set by developer intuition; we have no measurement of which combinations actually fill versus which time out.

This issue proposes:

1. **Phase 1**: collect per-lobby outcome data into a queryable database table.
2. **Phase 2** (separate follow-up): use that data to weight future lobby generation.

Phase 2 is deliberately deferred. The design discussion about which model to fit is far more useful with real numbers than with speculation, and the variance in fill rate across combos may turn out to be small enough that a simple blacklist is the right answer rather than a learned weighting.

## Background

Community feedback regularly suggests that the rotation produces lobbies that don't fill, but the distribution is not actually known. Prior threads:

- Reddit ["Maybe too many game modes now?" (May 2026)](https://www.reddit.com/r/Openfront/comments/1t3zm62/maybe_too_many_game_modes_now/) — OP cited five public lobbies, three of which were timing out below capacity, and proposed weighting future generation by demand.
- @mokizzz in #1808 (Aug 2025): proposed an EMA of join rate per mode/map to adjust selection probability. No follow-up.
- #2999 (in flight) — map voting via websocket. Good precedent for a feedback channel, but covers only the map dimension.
- #2992 (closed) — added an FFA/team filter button on the client. Does not change generation.

These are anecdotes. The point of phase 1 is to make them falsifiable.

A gap that pure map-voting (#2999) cannot close on its own: fill rate may depend on **combinations**, not individual factors — the fill rate of a `(map, team-count)` pair could differ substantially from the average over all team counts for that map. Whether interaction effects are large or small is itself an empirical question we can't answer without per-tuple data.

## Phase 1: collect data

For every public lobby created, emit one record at terminal state (game-start or timeout/cancel) into an analytical table. Proposed payload:

```jsonc
{
  "gameID": "...",
  "publicGameType": "ffa" | "team" | "special",
  "config": {
    "gameMap": "Baikal",
    "gameMode": "Team",
    "playerTeams": 2,                        // or "Duos"/"Trios"/"Quads"/"HumansVsNations"
    "publicGameModifiers": { ... },          // exact PublicGameModifiers as generated
    "maxPlayers": 60,
    "isCompact": false
  },
  "outcome": {
    "uniquePlayersJoined": 47,               // dedup by clientID
    "playersAtStart": 45,                    // present at countdown=0; null if timed out
    "lobbyOpenSeconds": 51,
    "fillRatio": 0.78,                       // playersAtStart / maxPlayers
    "joinRate": 0.92,                        // uniquePlayersJoined / lobbyOpenSeconds
    "terminalState": "started" | "timedOut"
  },
  "context": {
    "concurrentActivePlayers": 1240,         // (in-match + in-lobby) across all workers at lobby close
    "createdAt": "...",
    "closedAt": "..."
  }
}
```

`concurrentActivePlayers` is the normalization key. A join rate of 1.5 players/s when 200 people are online is much stronger evidence than 1.5/s when 3000 are online; without this denominator, time-of-day skew makes raw rates uncomparable.

### Why a database, not OTLP / logs

- OTLP and Loki retention is typically short (30–90 days). We may need months of data before per-tuple cells have enough samples to be meaningful, especially for rare modifier combinations.
- Log-based analytics is poor for tuple aggregation: expensive scans, no indexes, and structured fields are awkward to GROUP BY at scale.
- A proper table makes it trivial to dashboard variance in real time and to decide whether phase 2 is even worth doing.
- Schema discipline up front avoids retroactive migration when phase 2 needs structured joins (e.g. with future events like `gameEnded`, ranked outcomes, etc.).

The game server is stateless, so this implies a new table on the API service (`api.openfront.io`, closed-source) — for example a `lobby_results` table written via `POST /telemetry/lobbyResult`, or any other ingestion path the backend team prefers. The schema is small (one writer, one consumer, no real-time path needed) but does require coordination with whoever owns the backend repo.

## Phase 2: use the data (separate follow-up issue)

After several weeks of accumulation, with data in hand:

- Aggregate per `(map, mode, teamCount, modifierSet)` tuple.
- Compute a normalized score (e.g. EMA of `joinRate / concurrentActivePlayers`).
- Decide whether tuples are stored as full combos or factored into per-feature weights with selective interaction terms — based on what the data shows about interaction-effect size.
- Replace, or weight on top of, the static ticket pools in `MapPlaylist.ts`.
- Add an explore-exploit floor (e.g. ~10–15% uniform sampling) so unpopular combos still get periodic re-evaluation and the meta isn't frozen by past data.
- Initialize new maps and modifiers with the current static frequency as a Bayesian prior so cold-start is handled.

None of these decisions need to be made now.

## Alternative considered: per-player choice modeling

An alternative is to log every join event with the **full set of public lobbies visible to that player at that exact moment**, then fit a choice model (player chose A given offer set {A, B, C}). Auto-normalizes for time-of-day and naturally captures interactions.

It is not proposed first because:

- Substantially more data per event and more plumbing (snapshot the lobby broadcast at every join, join with per-tick state).
- The offer set is dynamic per-second; two joins to the "same" lobby by different players can have completely different offer contexts, so comparable observations are sparser than they look.
- Score extraction requires committing to a choice model up front.
- Per-lobby data is straightforwardly aggregatable to per-tuple, can be eyeballed in a dashboard, and the "is variance large enough to be worth modeling?" question can be answered without any ML.

If phase 2 reveals that interaction effects dominate and additive weights aren't enough, per-join offer-set logging can be added on top later — the two are complementary.

## Open questions for the team

1. **Backend coordination.** Who owns the API service, and is the team open to a `lobby_results` table + ingest path? This is the gating decision for phase 1.
2. **Existing data.** Does the game-record archive (`Archive.ts` → `POST /game/{gameID}`) already contain enough fields to reconstruct fill ratios offline for past games? If so, an export of recent archived records may bootstrap analysis without writing any new code.
3. **Active-players denominator.** Can `concurrentActivePlayers` be computed cheaply from existing master-side state, or does it need a new cross-worker counter? An approximation from the existing 500 ms broadcast is likely fine.
4. **PII.** Recommend logging only counts, no per-player IDs in this stream. Worth confirming there isn't a downstream need that would change this.
5. **Scope.** Should phase 2 weighting act on `special` only (where modifier variance is largest), or on all three slot types?

## Non-goals

- No UX changes (no voting UI, no preference panel).
- No model selection now. Phase 2 is a separate decision after data.
- Does not depend on, replace, or block #2999 (map voting). The two are complementary signals on different dimensions.

## Concrete first PR (once the backend table exists)

A small game-server-side PR that emits the `lobbyResult` payload at the existing `gameStarted` / `lobbyTimedOut` transitions in `Worker.ts` / `MasterLobbyService.ts`. Roughly 50 lines. Once accumulating, anyone can query the table and decide whether per-tuple variance justifies phase 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-driven weighting for public lobby generation (collect fill-rate telemetry, then feed back into MapPlaylist) #3887

Summary

Background

Phase 1: collect data

Why a database, not OTLP / logs

Phase 2: use the data (separate follow-up issue)

Alternative considered: per-player choice modeling

Open questions for the team

Non-goals

Concrete first PR (once the backend table exists)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data-driven weighting for public lobby generation (collect fill-rate telemetry, then feed back into MapPlaylist) #3887

Description

Summary

Background

Phase 1: collect data

Why a database, not OTLP / logs

Phase 2: use the data (separate follow-up issue)

Alternative considered: per-player choice modeling

Open questions for the team

Non-goals

Concrete first PR (once the backend table exists)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions