Skip to content

Data-driven weighting for public lobby generation (collect fill-rate telemetry, then feed back into MapPlaylist) #3887

@luctrate

Description

@luctrate

Summary

The three public lobbies (ffa, team, special) are currently generated from static, hard-coded weights in src/server/MapPlaylist.ts — the frequency table, TEAM_WEIGHTS, and the modifier ticket pool SPECIAL_MODIFIER_POOL. Every weight is set by developer intuition; we have no measurement of which combinations actually fill versus which time out.

This issue proposes:

  1. Phase 1: collect per-lobby outcome data into a queryable database table.
  2. Phase 2 (separate follow-up): use that data to weight future lobby generation.

Phase 2 is deliberately deferred. The design discussion about which model to fit is far more useful with real numbers than with speculation, and the variance in fill rate across combos may turn out to be small enough that a simple blacklist is the right answer rather than a learned weighting.

Background

Community feedback regularly suggests that the rotation produces lobbies that don't fill, but the distribution is not actually known. Prior threads:

These are anecdotes. The point of phase 1 is to make them falsifiable.

A gap that pure map-voting (#2999) cannot close on its own: fill rate may depend on combinations, not individual factors — the fill rate of a (map, team-count) pair could differ substantially from the average over all team counts for that map. Whether interaction effects are large or small is itself an empirical question we can't answer without per-tuple data.

Phase 1: collect data

For every public lobby created, emit one record at terminal state (game-start or timeout/cancel) into an analytical table. Proposed payload:

{
  "gameID": "...",
  "publicGameType": "ffa" | "team" | "special",
  "config": {
    "gameMap": "Baikal",
    "gameMode": "Team",
    "playerTeams": 2,                        // or "Duos"/"Trios"/"Quads"/"HumansVsNations"
    "publicGameModifiers": { ... },          // exact PublicGameModifiers as generated
    "maxPlayers": 60,
    "isCompact": false
  },
  "outcome": {
    "uniquePlayersJoined": 47,               // dedup by clientID
    "playersAtStart": 45,                    // present at countdown=0; null if timed out
    "lobbyOpenSeconds": 51,
    "fillRatio": 0.78,                       // playersAtStart / maxPlayers
    "joinRate": 0.92,                        // uniquePlayersJoined / lobbyOpenSeconds
    "terminalState": "started" | "timedOut"
  },
  "context": {
    "concurrentActivePlayers": 1240,         // (in-match + in-lobby) across all workers at lobby close
    "createdAt": "...",
    "closedAt": "..."
  }
}

concurrentActivePlayers is the normalization key. A join rate of 1.5 players/s when 200 people are online is much stronger evidence than 1.5/s when 3000 are online; without this denominator, time-of-day skew makes raw rates uncomparable.

Why a database, not OTLP / logs

  • OTLP and Loki retention is typically short (30–90 days). We may need months of data before per-tuple cells have enough samples to be meaningful, especially for rare modifier combinations.
  • Log-based analytics is poor for tuple aggregation: expensive scans, no indexes, and structured fields are awkward to GROUP BY at scale.
  • A proper table makes it trivial to dashboard variance in real time and to decide whether phase 2 is even worth doing.
  • Schema discipline up front avoids retroactive migration when phase 2 needs structured joins (e.g. with future events like gameEnded, ranked outcomes, etc.).

The game server is stateless, so this implies a new table on the API service (api.openfront.io, closed-source) — for example a lobby_results table written via POST /telemetry/lobbyResult, or any other ingestion path the backend team prefers. The schema is small (one writer, one consumer, no real-time path needed) but does require coordination with whoever owns the backend repo.

Phase 2: use the data (separate follow-up issue)

After several weeks of accumulation, with data in hand:

  • Aggregate per (map, mode, teamCount, modifierSet) tuple.
  • Compute a normalized score (e.g. EMA of joinRate / concurrentActivePlayers).
  • Decide whether tuples are stored as full combos or factored into per-feature weights with selective interaction terms — based on what the data shows about interaction-effect size.
  • Replace, or weight on top of, the static ticket pools in MapPlaylist.ts.
  • Add an explore-exploit floor (e.g. ~10–15% uniform sampling) so unpopular combos still get periodic re-evaluation and the meta isn't frozen by past data.
  • Initialize new maps and modifiers with the current static frequency as a Bayesian prior so cold-start is handled.

None of these decisions need to be made now.

Alternative considered: per-player choice modeling

An alternative is to log every join event with the full set of public lobbies visible to that player at that exact moment, then fit a choice model (player chose A given offer set {A, B, C}). Auto-normalizes for time-of-day and naturally captures interactions.

It is not proposed first because:

  • Substantially more data per event and more plumbing (snapshot the lobby broadcast at every join, join with per-tick state).
  • The offer set is dynamic per-second; two joins to the "same" lobby by different players can have completely different offer contexts, so comparable observations are sparser than they look.
  • Score extraction requires committing to a choice model up front.
  • Per-lobby data is straightforwardly aggregatable to per-tuple, can be eyeballed in a dashboard, and the "is variance large enough to be worth modeling?" question can be answered without any ML.

If phase 2 reveals that interaction effects dominate and additive weights aren't enough, per-join offer-set logging can be added on top later — the two are complementary.

Open questions for the team

  1. Backend coordination. Who owns the API service, and is the team open to a lobby_results table + ingest path? This is the gating decision for phase 1.
  2. Existing data. Does the game-record archive (Archive.tsPOST /game/{gameID}) already contain enough fields to reconstruct fill ratios offline for past games? If so, an export of recent archived records may bootstrap analysis without writing any new code.
  3. Active-players denominator. Can concurrentActivePlayers be computed cheaply from existing master-side state, or does it need a new cross-worker counter? An approximation from the existing 500 ms broadcast is likely fine.
  4. PII. Recommend logging only counts, no per-player IDs in this stream. Worth confirming there isn't a downstream need that would change this.
  5. Scope. Should phase 2 weighting act on special only (where modifier variance is largest), or on all three slot types?

Non-goals

  • No UX changes (no voting UI, no preference panel).
  • No model selection now. Phase 2 is a separate decision after data.
  • Does not depend on, replace, or block Implement Map voting #2999 (map voting). The two are complementary signals on different dimensions.

Concrete first PR (once the backend table exists)

A small game-server-side PR that emits the lobbyResult payload at the existing gameStarted / lobbyTimedOut transitions in Worker.ts / MasterLobbyService.ts. Roughly 50 lines. Once accumulating, anyone can query the table and decide whether per-tuple variance justifies phase 2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions