Skip to content

Optimize total_surplus query#4116

Open
jmg-duarte wants to merge 7 commits intomainfrom
jmgd/surplus
Open

Optimize total_surplus query#4116
jmg-duarte wants to merge 7 commits intomainfrom
jmgd/surplus

Conversation

@jmg-duarte
Copy link
Contributor

@jmg-duarte jmg-duarte commented Feb 3, 2026

Description

We've been experiencing latency spikes on several endpoints, we've pinned this down to the time it takes to acquire DB connections from the pool; when checking RDS monitoring, the surplus query always shows up at the top.

The current theory is that the query is a bit slower than it could be, as more users request the main swap page, their surplus is loaded (even if they don't request it — i.e. load the wallet modal) and if some of these users have a larger amount of orders, they're taking up connections that other endpoints aren't getting.

I looked into the user distribution, here are the results:

log_bucket bucket_start num_users pct cumulative_users cumulative_pct
0 1 295329 83.86 295329 83.86
1 10 51334 14.58 346663 98.44
2 100 5071 1.44 351734 99.88
3 1000 306 0.09 352040 99.96
4 10000 106 0.03 352146 99.99
5 100000 28 0.01 352174 100.00
Distribution Query

WITH buckets AS (
    SELECT 
        floor(log(order_count))::int as log_bucket,
        power(10, floor(log(order_count)))::int as bucket_start,
        count(*) as num_users
    FROM (
        SELECT owner, count(*) as order_count 
        FROM orders 
        GROUP BY owner
    ) sub
    GROUP BY 1, 2
)
SELECT 
    log_bucket,
    bucket_start,
    num_users,
    round(100.0 * num_users / sum(num_users) over(), 2) as pct,
    sum(num_users) over(order by log_bucket) as cumulative_users,
    round(100.0 * sum(num_users) over(order by log_bucket) / sum(num_users) over(), 2) as cumulative_pct
FROM buckets
ORDER BY log_bucket;

However, if it was just this, it would be too simple. Depending on the user, they might have no orders and only onchain orders (note that the max number of onchain orders is around ~10k), some will have skewed data distributions across tables too, leading to analyzing and optimizing this query a bit tricky.

There are two crucial changes to the query — removing ARRAY_AGG and adding indexes — the first makes it so that the DB does not have to materialize a potentially big array in memory, which would otherwise lead to bad estimations too; the second provides better "access paths" to some of the information the query requires.

RDS Stats (1h)

Before

Screenshot 2026-02-05 at 11-12-43 CloudWatch eu-central-1

After

Screenshot 2026-02-05 at 11-14-53 CloudWatch eu-central-1

Changes

  • Replace the query with the optimized one
  • Create indexes (already done to avoid issues during migration)

Query Plans

Before

Aggregate  (cost=1033006.55..1033006.56 rows=1 width=8) (actual time=130139.025..130143.456 rows=1 loops=1)
  ->  Append  (cost=1032650.26..1033006.51 rows=2 width=100) (actual time=31175.499..130090.776 rows=10998 loops=1)
        ->  Subquery Scan on "*SELECT* 1"  (cost=1032650.26..1032856.24 rows=1 width=100) (actual time=31175.498..130085.109 rows=10998 loops=1)
              ->  Nested Loop  (cost=1032650.26..1032856.23 rows=1 width=100) (actual time=31175.497..130079.454 rows=10998 loops=1)
                    Join Filter: (t.order_uid = o.uid)
                    InitPlan 3 (returns $5)
                      ->  Finalize Aggregate  (cost=1032601.57..1032601.58 rows=1 width=32) (actual time=30761.210..30765.632 rows=1 loops=1)
                            ->  Gather  (cost=1032601.35..1032601.56 rows=2 width=32) (actual time=30678.267..30689.666 rows=3 loops=1)
                                  Workers Planned: 2
                                  Workers Launched: 2
                                  ->  Partial Aggregate  (cost=1031601.35..1031601.36 rows=1 width=32) (actual time=30675.862..30675.863 rows=1 loops=3)
                                        ->  Parallel Bitmap Heap Scan on orders  (cost=24591.99..1030861.56 rows=295913 width=57) (actual time=278.099..30619.932 rows=226281 loops=3)
                                              Recheck Cond: (owner = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                                              Rows Removed by Index Recheck: 737927
                                              Heap Blocks: exact=15162 lossy=55571
                                              ->  Bitmap Index Scan on order_owner  (cost=0.00..24414.44 rows=710192 width=0) (actual time=271.994..271.994 rows=749143 loops=1)
                                                    Index Cond: (owner = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                    InitPlan 4 (returns $6)
                      ->  Aggregate  (cost=47.00..47.01 rows=1 width=32) (actual time=0.021..0.022 rows=1 loops=1)
                            ->  Bitmap Heap Scan on onchain_placed_orders  (cost=4.09..46.97 rows=11 width=57) (actual time=0.017..0.018 rows=0 loops=1)
                                  Recheck Cond: (sender = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                                  ->  Bitmap Index Scan on order_sender  (cost=0.00..4.08 rows=11 width=0) (actual time=0.013..0.013 rows=0 loops=1)
                                        Index Cond: (sender = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                    ->  Nested Loop  (cost=1.12..136.08 rows=2 width=188) (actual time=31172.849..109529.707 rows=10914 loops=1)
                          ->  Index Scan using orders_pkey on orders o  (cost=0.56..85.71 rows=10 width=123) (actual time=31172.113..101635.999 rows=678843 loops=1)
                                Index Cond: (uid = ANY (array_cat($5, $6)))
                          ->  Index Only Scan using order_rewards_pkey on order_execution oe  (cost=0.56..5.01 rows=3 width=65) (actual time=0.011..0.011 rows=0 loops=678843)
                                Index Cond: (order_uid = o.uid)
                                Heap Fetches: 170
                    ->  Index Scan using trade_order_uid on trades t  (cost=0.56..0.74 rows=2 width=81) (actual time=0.416..0.416 rows=1 loops=10914)
                          Index Cond: (order_uid = oe.order_uid)
                    SubPlan 1
                      ->  Index Scan using auction_prices_pkey on auction_prices ap  (cost=0.58..35.00 rows=16 width=11) (actual time=1.428..1.428 rows=1 loops=9675)
                            Index Cond: ((auction_id = oe.auction_id) AND (token = o.buy_token))
                    SubPlan 2
                      ->  Index Scan using auction_prices_pkey on auction_prices ap_1  (cost=0.58..35.00 rows=16 width=11) (actual time=1.596..1.596 rows=1 loops=1323)
                            Index Cond: ((auction_id = oe.auction_id) AND (token = o.sell_token))
        ->  Subquery Scan on "*SELECT* 2"  (cost=2.09..150.26 rows=1 width=100) (actual time=0.010..0.013 rows=0 loops=1)
              ->  Nested Loop  (cost=2.09..150.25 rows=1 width=100) (actual time=0.009..0.012 rows=0 loops=1)
                    Join Filter: (j.uid = t_1.order_uid)
                    ->  Nested Loop  (cost=1.54..79.45 rows=1 width=192) (actual time=0.009..0.010 rows=0 loops=1)
                          ->  Nested Loop Anti Join  (cost=0.98..74.32 rows=1 width=127) (actual time=0.009..0.010 rows=0 loops=1)
                                ->  Index Scan using jit_user_order_creation_timestamp on jit_orders j  (cost=0.42..33.62 rows=8 width=127) (actual time=0.008..0.009 rows=0 loops=1)
                                      Index Cond: (owner = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                                ->  Index Only Scan using orders_pkey on orders o_1  (cost=0.56..5.08 rows=1 width=57) (never executed)
                                      Index Cond: (uid = j.uid)
                                      Heap Fetches: 0
                          ->  Index Only Scan using order_rewards_pkey on order_execution oe_1  (cost=0.56..5.11 rows=3 width=65) (never executed)
                                Index Cond: (order_uid = j.uid)
                                Heap Fetches: 0
                    ->  Index Scan using trade_order_uid on trades t_1  (cost=0.56..0.74 rows=2 width=81) (never executed)
                          Index Cond: (order_uid = oe_1.order_uid)
                    SubPlan 5
                      ->  Index Scan using auction_prices_pkey on auction_prices ap_2  (cost=0.58..35.00 rows=16 width=11) (never executed)
                            Index Cond: ((auction_id = oe_1.auction_id) AND (token = j.buy_token))
                    SubPlan 6
                      ->  Index Scan using auction_prices_pkey on auction_prices ap_3  (cost=0.58..35.00 rows=16 width=11) (never executed)
                            Index Cond: ((auction_id = oe_1.auction_id) AND (token = j.sell_token))
Planning Time: 4.788 ms
Execution Time: 130157.850 ms

After

Aggregate  (cost=14614.00..14614.01 rows=1 width=8) (actual time=1901.439..1902.539 rows=1 loops=1)
  Buffers: shared hit=11838 read=4075
  I/O Timings: shared read=2240.469
  ->  Gather Merge  (cost=14557.86..14604.91 rows=404 width=136) (actual time=1900.725..1902.018 rows=917 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=11838 read=4075
        I/O Timings: shared read=2240.469
        ->  Sort  (cost=13557.83..13558.25 rows=168 width=136) (actual time=762.591..762.628 rows=306 loops=3)
              Sort Key: "*SELECT* 2".uid
              Sort Method: quicksort  Memory: 49kB
              Buffers: shared hit=11838 read=4075
              I/O Timings: shared read=2240.469
              Worker 0:  Sort Method: quicksort  Memory: 115kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Parallel Append  (cost=2.25..13551.63 rows=168 width=136) (actual time=2.247..762.040 rows=306 loops=3)
                    Buffers: shared hit=11824 read=4075
                    I/O Timings: shared read=2240.469
                    ->  Subquery Scan on "*SELECT* 2"  (cost=24.42..13550.79 rows=53 width=136) (actual time=3.496..1892.483 rows=709 loops=1)
                          Buffers: shared hit=9702 read=3415
                          I/O Timings: shared read=1854.511
                          ->  Nested Loop Left Join  (cost=24.42..13550.26 rows=53 width=136) (actual time=3.495..1892.175 rows=709 loops=1)
                                Buffers: shared hit=9702 read=3415
                                I/O Timings: shared read=1854.511
                                ->  Nested Loop  (cost=23.84..13414.32 rows=53 width=155) (actual time=1.455..762.309 rows=709 loops=1)
                                      Buffers: shared hit=8050 read=1521
                                      I/O Timings: shared read=736.402
                                      ->  Nested Loop  (cost=23.28..12573.83 rows=195 width=245) (actual time=1.418..667.512 rows=709 loops=1)
                                            Buffers: shared hit=5387 read=1337
                                            I/O Timings: shared read=647.346
                                            ->  Nested Loop  (cost=22.73..9169.44 rows=796 width=180) (actual time=1.397..584.403 rows=711 loops=1)
                                                  Buffers: shared hit=2649 read=1211
                                                  I/O Timings: shared read=570.801
                                                  ->  Bitmap Heap Scan on onchain_placed_orders opo  (cost=22.17..2343.76 rows=796 width=57) (actual time=0.699..6.255 rows=711 loops=1)
                                                        Recheck Cond: (sender = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                        Buffers: shared hit=56 read=248
                                                        I/O Timings: shared read=2.028
                                                        ->  Bitmap Index Scan on order_sender  (cost=0.00..21.97 rows=796 width=0) (actual time=0.044..0.044 rows=711 loops=1)
                                                              Index Cond: (sender = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                              Buffers: shared hit=4
                                                  ->  Index Scan using orders_pkey on orders o  (cost=0.56..8.57 rows=1 width=123) (actual time=0.812..0.812 rows=1 loops=711)
                                                        Index Cond: (uid = opo.uid)
                                                        Filter: (owner <> '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                        Buffers: shared hit=2593 read=963
                                                        I/O Timings: shared read=568.773
                                            ->  Index Only Scan using order_rewards_pkey on order_execution oe  (cost=0.56..4.25 rows=3 width=65) (actual time=0.115..0.115 rows=1 loops=711)
                                                  Index Cond: (order_uid = o.uid)
                                                  Heap Fetches: 16
                                                  Buffers: shared hit=2738 read=126
                                                  I/O Timings: shared read=76.545
                                      ->  Index Only Scan using trades_covering on trades t  (cost=0.56..4.29 rows=2 width=81) (actual time=0.132..0.133 rows=1 loops=709)
                                            Index Cond: (order_uid = o.uid)
                                            Heap Fetches: 0
                                            Buffers: shared hit=2663 read=184
                                            I/O Timings: shared read=89.056
                                ->  Index Scan using auction_prices_pkey on auction_prices ap  (cost=0.58..31.94 rows=16 width=40) (actual time=1.590..1.590 rows=1 loops=709)
                                      Index Cond: ((auction_id = oe.auction_id) AND (token = CASE o.kind WHEN 'sell'::orderkind THEN o.buy_token ELSE o.sell_token END))
                                      Buffers: shared hit=1652 read=1894
                                      I/O Timings: shared read=1118.109
                    ->  Subquery Scan on "*SELECT* 1"  (cost=2.25..4428.09 rows=350 width=136) (actual time=3.233..393.256 rows=208 loops=1)
                          Buffers: shared hit=2119 read=660
                          I/O Timings: shared read=385.958
                          ->  Nested Loop  (cost=2.25..4424.59 rows=350 width=136) (actual time=3.233..393.181 rows=208 loops=1)
                                Join Filter: (t_1.order_uid = o_1.uid)
                                Buffers: shared hit=2119 read=660
                                I/O Timings: shared read=385.958
                                ->  Nested Loop Left Join  (cost=1.70..3742.63 rows=156 width=149) (actual time=3.212..369.737 rows=208 loops=1)
                                      Buffers: shared hit=1320 read=618
                                      I/O Timings: shared read=364.992
                                      ->  Nested Loop  (cost=1.12..3316.62 rows=156 width=188) (actual time=1.266..28.471 rows=208 loops=1)
                                            Buffers: shared hit=849 read=49
                                            I/O Timings: shared read=26.775
                                            ->  Index Only Scan using orders_owner_covering on orders o_1  (cost=0.56..187.36 rows=638 width=123) (actual time=1.247..2.194 rows=210 loops=1)
                                                  Index Cond: (owner = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                  Heap Fetches: 1
                                                  Buffers: shared hit=24 read=7
                                                  I/O Timings: shared read=2.038
                                            ->  Index Only Scan using order_rewards_pkey on order_execution oe_1  (cost=0.56..4.87 rows=3 width=65) (actual time=0.124..0.124 rows=1 loops=210)
                                                  Index Cond: (order_uid = o_1.uid)
                                                  Heap Fetches: 4
                                                  Buffers: shared hit=825 read=42
                                                  I/O Timings: shared read=24.736
                                      ->  Index Scan using auction_prices_pkey on auction_prices ap_1  (cost=0.58..34.99 rows=16 width=40) (actual time=1.639..1.639 rows=1 loops=208)
                                            Index Cond: ((auction_id = oe_1.auction_id) AND (token = CASE o_1.kind WHEN 'sell'::orderkind THEN o_1.buy_token ELSE o_1.sell_token END))
                                            Buffers: shared hit=471 read=569
                                            I/O Timings: shared read=338.217
                                ->  Index Only Scan using trades_covering on trades t_1  (cost=0.56..4.29 rows=2 width=81) (actual time=0.109..0.109 rows=1 loops=208)
                                      Index Cond: (order_uid = oe_1.order_uid)
                                      Heap Fetches: 0
                                      Buffers: shared hit=799 read=42
                                      I/O Timings: shared read=20.966
                    ->  Subquery Scan on "*SELECT* 3"  (cost=2.67..94.60 rows=1 width=136) (actual time=0.012..0.014 rows=0 loops=1)
                          Buffers: shared hit=3
                          ->  Nested Loop Left Join  (cost=2.67..94.59 rows=1 width=136) (actual time=0.011..0.013 rows=0 loops=1)
                                Buffers: shared hit=3
                                ->  Nested Loop  (cost=2.09..91.83 rows=1 width=159) (actual time=0.011..0.013 rows=0 loops=1)
                                      Buffers: shared hit=3
                                      ->  Nested Loop  (cost=1.54..87.55 rows=1 width=208) (actual time=0.011..0.012 rows=0 loops=1)
                                            Buffers: shared hit=3
                                            ->  Nested Loop Anti Join  (cost=0.98..82.50 rows=1 width=127) (actual time=0.011..0.011 rows=0 loops=1)
                                                  Buffers: shared hit=3
                                                  ->  Index Scan using jit_user_order_creation_timestamp on jit_orders j  (cost=0.42..37.21 rows=9 width=127) (actual time=0.010..0.010 rows=0 loops=1)
                                                        Index Cond: (owner = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                        Buffers: shared hit=3
                                                  ->  Index Only Scan using orders_pkey on orders o_2  (cost=0.56..5.02 rows=1 width=57) (never executed)
                                                        Index Cond: (uid = j.uid)
                                                        Heap Fetches: 0
                                            ->  Index Only Scan using trades_covering on trades t_2  (cost=0.56..5.03 rows=2 width=81) (never executed)
                                                  Index Cond: (order_uid = j.uid)
                                                  Heap Fetches: 0
                                      ->  Index Only Scan using order_rewards_pkey on order_execution oe_2  (cost=0.56..4.25 rows=3 width=65) (never executed)
                                            Index Cond: (order_uid = t_2.order_uid)
                                            Heap Fetches: 0
                                ->  Index Scan using auction_prices_pkey on auction_prices ap_2  (cost=0.58..35.01 rows=16 width=40) (never executed)
                                      Index Cond: ((auction_id = oe_2.auction_id) AND (token = CASE j.kind WHEN 'sell'::orderkind THEN j.buy_token ELSE j.sell_token END))
Planning:
  Buffers: shared hit=892 read=21
  I/O Timings: shared read=14.023
Planning Time: 18.375 ms
Execution Time: 1902.681 ms

How to test

Due to the fact that floating point addition is not commutative and the order specified in the old query is not deterministic (the ORDER BY uid is merely an approximation that matches) the validation script leaves some room for differences, 1e-9 to be precise.

Validation script

#!/usr/bin/env python3
"""
Compare original and optimized surplus queries for correctness.
Picks random addresses and verifies both queries return identical results.
"""

import os
import sys
import psycopg2
from psycopg2 import sql
from decimal import Decimal

# Connection settings - override with environment variables
DB_CONFIG = {
    "host": os.getenv("DB_HOST", "localhost"),
    "port": os.getenv("DB_PORT", "5432"),
    "dbname": os.getenv("DB_NAME", "your_database"),
    "user": os.getenv("DB_USER", "your_user"),
    "password": os.getenv("DB_PASSWORD", ""),
}

ORIGINAL_QUERY = """
WITH regular_orders AS (
    SELECT ARRAY_AGG(uid) AS ids FROM orders WHERE owner = $1
),
onchain_orders AS (
    SELECT ARRAY_AGG(uid) AS ids FROM onchain_placed_orders WHERE sender = $1
),
trade_components AS (
    SELECT
       CASE kind
          WHEN 'sell' THEN t.buy_amount
          WHEN 'buy' THEN t.sell_amount - t.fee_amount
       END AS trade_amount,
       CASE kind
          WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * o.buy_amount / o.sell_amount
          WHEN 'buy' THEN t.buy_amount * o.sell_amount / o.buy_amount
       END AS limit_amount,
       o.kind,
       CASE kind
          WHEN 'sell' THEN (SELECT price FROM auction_prices ap WHERE ap.token = o.buy_token AND ap.auction_id = oe.auction_id)
          WHEN 'buy' THEN (SELECT price FROM auction_prices ap WHERE ap.token = o.sell_token AND ap.auction_id = oe.auction_id)
       END AS surplus_token_native_price
    FROM orders o
    JOIN trades t ON o.uid = t.order_uid
    JOIN order_execution oe ON o.uid = oe.order_uid
    WHERE o.uid = ANY(ARRAY_CAT((SELECT ids FROM regular_orders), (SELECT ids FROM onchain_orders)))
    UNION ALL
    SELECT
       CASE j.kind
          WHEN 'sell' THEN t.buy_amount
          WHEN 'buy' THEN t.sell_amount - t.fee_amount
       END AS trade_amount,
       CASE j.kind
          WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * j.buy_amount / j.sell_amount
          WHEN 'buy' THEN t.buy_amount * j.sell_amount / j.buy_amount
       END AS limit_amount,
       j.kind,
       CASE j.kind
          WHEN 'sell' THEN (SELECT price FROM auction_prices ap WHERE ap.token = j.buy_token AND ap.auction_id = oe.auction_id)
          WHEN 'buy' THEN (SELECT price FROM auction_prices ap WHERE ap.token = j.sell_token AND ap.auction_id = oe.auction_id)
       END AS surplus_token_native_price
    FROM jit_orders j
    JOIN trades t ON j.uid = t.order_uid
    JOIN order_execution oe ON j.uid = oe.order_uid
    WHERE j.owner = $1 AND NOT EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.uid = j.uid
    )
),
trade_surplus AS (
    SELECT
        CASE kind
            WHEN 'sell' THEN (trade_amount - limit_amount) * surplus_token_native_price
            WHEN 'buy' THEN (limit_amount - trade_amount) * surplus_token_native_price
        END / POWER(10, 18) AS surplus_in_wei
    FROM trade_components
)
SELECT
   COALESCE(SUM(surplus_in_wei), 0) AS total_surplus_in_wei
FROM trade_surplus;
"""

OPTIMIZED_QUERY = """
WITH trade_components AS (
    -- Regular orders: join trades first, then order_execution
    SELECT
        o.uid,
        CASE o.kind
            WHEN 'sell' THEN t.buy_amount
            WHEN 'buy' THEN t.sell_amount - t.fee_amount
        END AS trade_amount,
        CASE o.kind
            WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * o.buy_amount / o.sell_amount
            WHEN 'buy' THEN t.buy_amount * o.sell_amount / o.buy_amount
        END AS limit_amount,
        o.kind,
        ap.price AS surplus_token_native_price
    FROM orders o
    JOIN trades t ON t.order_uid = o.uid
    JOIN order_execution oe ON oe.order_uid = t.order_uid
    LEFT JOIN auction_prices ap
        ON ap.auction_id = oe.auction_id
        AND ap.token = CASE o.kind WHEN 'sell' THEN o.buy_token ELSE o.sell_token END
    WHERE o.owner = $1

    UNION ALL

    -- Onchain placed orders (if sender differs from owner)
    SELECT
        o.uid,
        CASE o.kind
            WHEN 'sell' THEN t.buy_amount
            WHEN 'buy' THEN t.sell_amount - t.fee_amount
        END AS trade_amount,
        CASE o.kind
            WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * o.buy_amount / o.sell_amount
            WHEN 'buy' THEN t.buy_amount * o.sell_amount / o.buy_amount
        END AS limit_amount,
        o.kind,
        ap.price AS surplus_token_native_price
    FROM onchain_placed_orders opo
    JOIN orders o ON o.uid = opo.uid AND o.owner != $1
    JOIN trades t ON t.order_uid = o.uid
    JOIN order_execution oe ON oe.order_uid = t.order_uid
    LEFT JOIN auction_prices ap
        ON ap.auction_id = oe.auction_id
        AND ap.token = CASE o.kind WHEN 'sell' THEN o.buy_token ELSE o.sell_token END
    WHERE opo.sender = $1

    UNION ALL

    -- JIT orders
    SELECT
        j.uid,
        CASE j.kind
            WHEN 'sell' THEN t.buy_amount
            WHEN 'buy' THEN t.sell_amount - t.fee_amount
        END AS trade_amount,
        CASE j.kind
            WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * j.buy_amount / j.sell_amount
            WHEN 'buy' THEN t.buy_amount * j.sell_amount / j.buy_amount
        END AS limit_amount,
        j.kind,
        ap.price AS surplus_token_native_price
    FROM jit_orders j
    JOIN trades t ON t.order_uid = j.uid
    JOIN order_execution oe ON oe.order_uid = t.order_uid
    LEFT JOIN auction_prices ap
        ON ap.auction_id = oe.auction_id
        AND ap.token = CASE j.kind WHEN 'sell' THEN j.buy_token ELSE j.sell_token END
    WHERE j.owner = $1
      AND NOT EXISTS (SELECT 1 FROM orders o WHERE o.uid = j.uid)
)
SELECT
    COALESCE(SUM(surplus_in_wei ORDER BY uid), 0) AS total_surplus_in_wei
FROM (
    SELECT
        uid,
        CASE kind
            WHEN 'sell' THEN (trade_amount - limit_amount) * surplus_token_native_price
            WHEN 'buy' THEN (limit_amount - trade_amount) * surplus_token_native_price
        END / POWER(10, 18) AS surplus_in_wei
    FROM trade_components
) ts;
"""

# Query to get random addresses with varying order counts
SAMPLE_ADDRESSES_QUERY = """
WITH address_order_counts AS (
    SELECT owner AS address, COUNT(*) AS order_count
    FROM orders
    GROUP BY owner
),
bucketed AS (
    SELECT
        address,
        order_count,
        CASE
            WHEN order_count < 10 THEN '0: 1-9'
            WHEN order_count < 100 THEN '1: 10-99'
            WHEN order_count < 1000 THEN '2: 100-999'
            WHEN order_count < 10000 THEN '3: 1000-9999'
            ELSE '4: 10000+'
        END AS bucket
    FROM address_order_counts
)
SELECT address, order_count, bucket
FROM (
    SELECT address, order_count, bucket,
           ROW_NUMBER() OVER (PARTITION BY bucket ORDER BY RANDOM()) AS rn
    FROM bucketed
) ranked
WHERE rn <= %s
ORDER BY bucket, order_count;
"""


def get_connection():
    return psycopg2.connect(**DB_CONFIG)


def fetch_sample_addresses(conn, samples_per_bucket=5):
    """Get random addresses from each order count bucket."""
    with conn.cursor() as cur:
        cur.execute(SAMPLE_ADDRESSES_QUERY, (samples_per_bucket,))
        return cur.fetchall()


def run_query(conn, query, address):
    """Run a query with the given address parameter."""
    # Convert $1 placeholder to %s for psycopg2
    pg_query = query.replace("$1", "%s")
    # Count how many parameters we need
    param_count = pg_query.count("%s")
    with conn.cursor() as cur:
        cur.execute(pg_query, tuple([address] * param_count))
        result = cur.fetchone()
        return result[0] if result else None


def compare_results(original, optimized, tolerance=1e-9):
    """Compare two numeric results with floating point tolerance."""
    if original is None and optimized is None:
        return True, "both NULL"
    if original is None or optimized is None:
        return False, f"NULL mismatch: original={original}, optimized={optimized}"

    # Convert to float for comparison
    orig_float = float(original)
    opt_float = float(optimized)

    if orig_float == opt_float:
        return True, "exact match"

    # Check relative difference for non-zero values
    if orig_float != 0:
        rel_diff = abs(orig_float - opt_float) / abs(orig_float)
        if rel_diff < tolerance:
            return True, f"within tolerance (rel_diff={rel_diff:.2e})"

    # Check absolute difference for values near zero
    abs_diff = abs(orig_float - opt_float)
    if abs_diff < tolerance:
        return True, f"within tolerance (abs_diff={abs_diff:.2e})"

    return False, f"MISMATCH: original={orig_float}, optimized={opt_float}, diff={abs_diff}"


def main():
    samples_per_bucket = int(sys.argv[1]) if len(sys.argv) > 1 else 5

    print(f"Comparing original vs optimized query ({samples_per_bucket} samples per bucket)")
    print("=" * 80)

    conn = get_connection()

    # Get sample addresses
    print("\nFetching sample addresses...")
    addresses = fetch_sample_addresses(conn, samples_per_bucket)
    print(f"Found {len(addresses)} addresses across buckets\n")

    passed = 0
    failed = 0
    current_bucket = None

    for address, order_count, bucket in addresses:
        if bucket != current_bucket:
            current_bucket = bucket
            print(f"\n--- Bucket: {bucket} ---")

        # Run both queries
        original_result = run_query(conn, ORIGINAL_QUERY, address)
        optimized_result = run_query(conn, OPTIMIZED_QUERY, address)

        # Compare
        match, reason = compare_results(original_result, optimized_result)

        addr_hex = "0x" + address.hex() if isinstance(address, (bytes, memoryview)) else str(address)
        status = "✓" if match else "✗"

        print(f"  {status} {addr_hex[:18]}... ({order_count:>5} orders): {reason}")

        if match:
            passed += 1
        else:
            failed += 1
            # Print detailed values on failure
            print(f"      Original:  {original_result}")
            print(f"      Optimized: {optimized_result}")

    # Summary
    print("\n" + "=" * 80)
    print(f"SUMMARY: {passed} passed, {failed} failed out of {passed + failed} tests")

    conn.close()

    return 0 if failed == 0 else 1


if __name__ == "__main__":
    sys.exit(main())

To validate the performance, I think it's best we give it a run in prod for anywhere from 30 minutes to 2 hours.
Even while requiring indexes, the new query should be faster.

@jmg-duarte jmg-duarte marked this pull request as ready for review February 3, 2026 11:50
@jmg-duarte jmg-duarte requested a review from a team as a code owner February 3, 2026 11:50
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request optimizes the total_surplus query by replacing an inefficient ARRAY_AGG approach with a UNION ALL of separate CTEs, which yields significant performance improvements. The logic appears correct, and the performance gains are well-documented. The original comment regarding significant code duplication in the CTEs and the suggestion to refactor for improved maintainability is valid and aligns with best practices, as none of the provided rules contradict it. Therefore, the comment has been kept as is.

@jmg-duarte jmg-duarte added the hotfix Labels PRs that should be applied into production right away label Feb 3, 2026
Copy link
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few comments.

Also, for such PRs, I'd expect to see EXPLAIN ANALYZE before and after the change.


-- Additional query for jit_orders
-- deduplicate orders
WHERE o.owner != $1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't miss anything, the old query included orders where owner = $1 OR sender = $1 (via ARRAY_CAT). The new onchain_order_trades CTE explicitly excludes orders where owner = $1.
This means if an order has both owner = $1 and sender = $1, it will now be counted only once (in regular_order_trades), whereas before it may have been processed differently through the array concatenation.

While deduplication seems reasonable, I just want to ensure that doesn't break any business logic.

Copy link
Contributor

@m-sz m-sz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the data and script. LGTM.

@squadgazzz
Copy link
Contributor

I got a benchmark test + result checker script
SQL Query Benchmark

I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way.

WHERE o.owner = $1
),
trade_components AS (
onchain_order_trades AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could drop the onchain_order_trades altogether, no?
Every onchain placed order also has an entry in the regular orders table so we would already count those traders there if we do:

WHERE COALESCE(onchain_placed_order.sender, o.owner) = $1

Would have to be tested whether this is faster, though (and leads to the same result).

@jmg-duarte
Copy link
Contributor Author

jmg-duarte commented Feb 3, 2026

I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way.

For both benchmarks I queried for different addresses, which is closer to what we would experience in prod. The user queries once and the subsequent queries would take advantage of that cache.

I'm open to ideas to improve them!

@fafk
Copy link
Contributor

fafk commented Feb 4, 2026

I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way.

For both benchmarks I queried for different addresses, which is closer to what we would experience in prod. The user queries once and the subsequent queries would take advantage of that cache.

I'm open to ideas to improve them!

There is a way to avoid caching but it's mighty unpleasant: pull the DB data, run a local postgres instance in docker, load the data. Then run query 1. Turn off docker. Turn on docker. Run query 2 & compare results. More details here: #3542 (comment) + https://pastebin.com/ijpEmVDs

To be fair I think as long as you are sure the queries are functionally equivalent (give the same results) I'd just deploy it and observe the speed diff and revert if it doesn't work.

@jmg-duarte
Copy link
Contributor Author

There is a way to avoid caching but it's mighty unpleasant

Understatement for +1TB of data. Performing the same setup in AWS will also take a bunch 😭

@MartinquaXD
Copy link
Contributor

To be fair I think as long as you are sure the queries are functionally equivalent (give the same results) I'd just deploy it and observe the speed diff and revert if it doesn't work.

Yeah, when no additional indexes are needed and confidence is reasonably high we can just temporarily deploy to prod for a bit.

@jmg-duarte jmg-duarte marked this pull request as draft February 4, 2026 10:23
@fafk
Copy link
Contributor

fafk commented Feb 4, 2026

There is a way to avoid caching but it's mighty unpleasant

Understatement for +1TB of data. Performing the same setup in AWS will also take a bunch 😭

The trick is to only to only pull the tables you need. The biggest table (e.g. prices) you wouldn't need. Last time I did this I ended up with 10 gigs.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Reminder: Please update the DB Readme and comment whether migrations are reversible (include rollback scripts if applicable).

  • If creating new tables, update the tables list.
  • When adding a new index, consider using CREATE INDEX CONCURRENTLY for tables involved in the critical execution path.
  • For breaking changes, remember that during rollout k8s starts the new autopilot, runs the Flyway migration, and only then shuts down the old pod. That overlap means the previous version can still be processing requests on the migrated schema, so make it compatible first and ship the breaking DB change in the following release.

Caused by:

@jmg-duarte jmg-duarte marked this pull request as ready for review February 5, 2026 11:39
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the total_surplus query by restructuring it to avoid ARRAY_AGG and adding covering indexes, which significantly improves performance as shown by the query plans. However, I've identified a critical pre-existing issue in the query logic where trades and auction executions are joined. This can lead to incorrect surplus calculations due to a Cartesian product. While this PR is an optimization, the underlying logic bug should be addressed.

Copy link
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The execution plan already shows some improvements. I assume the validation script is meant to only test the business logic, not performance, right? Since it warms up the cache before executing the optimized query.

[CoWSwapEthFlow](https://github.com/cowprotocol/ethflowcontract/blob/main/src/CoWSwapEthFlow.sol) we actually deployed twice so events related to the staging environment should only show up in the staging DB and likewise for production.
It's also important to note that we only index events from blocks that we are certain will not get reorged. That means specifically that events will be indexed with a block delay of at least 64.

> For lessons learned on migrations, refer to the respective [Notion page](https://www.notion.so/cownation/Database-migration-learnings-2fb8da5f04ca8076bf05c433e461a139?utm_content=2fb8da5f-04ca-8076-bf05-c433e461a139&utm_campaign=T035UKY5NUB&pvs=6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page is private, which doesn't look right for an open-source repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotfix Labels PRs that should be applied into production right away

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants