Conversation
There was a problem hiding this comment.
Code Review
The pull request optimizes the total_surplus query by replacing an inefficient ARRAY_AGG approach with a UNION ALL of separate CTEs, which yields significant performance improvements. The logic appears correct, and the performance gains are well-documented. The original comment regarding significant code duplication in the CTEs and the suggestion to refactor for improved maintainability is valid and aligns with best practices, as none of the provided rules contradict it. Therefore, the comment has been kept as is.
squadgazzz
left a comment
There was a problem hiding this comment.
Nice! A few comments.
Also, for such PRs, I'd expect to see EXPLAIN ANALYZE before and after the change.
|
|
||
| -- Additional query for jit_orders | ||
| -- deduplicate orders | ||
| WHERE o.owner != $1 |
There was a problem hiding this comment.
If I didn't miss anything, the old query included orders where owner = $1 OR sender = $1 (via ARRAY_CAT). The new onchain_order_trades CTE explicitly excludes orders where owner = $1.
This means if an order has both owner = $1 and sender = $1, it will now be counted only once (in regular_order_trades), whereas before it may have been processed differently through the array concatenation.
While deduplication seems reasonable, I just want to ensure that doesn't break any business logic.
m-sz
left a comment
There was a problem hiding this comment.
Thanks for providing the data and script. LGTM.
I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way. |
| WHERE o.owner = $1 | ||
| ), | ||
| trade_components AS ( | ||
| onchain_order_trades AS ( |
There was a problem hiding this comment.
I believe we could drop the onchain_order_trades altogether, no?
Every onchain placed order also has an entry in the regular orders table so we would already count those traders there if we do:
WHERE COALESCE(onchain_placed_order.sender, o.owner) = $1
Would have to be tested whether this is faster, though (and leads to the same result).
For both benchmarks I queried for different addresses, which is closer to what we would experience in prod. The user queries once and the subsequent queries would take advantage of that cache. I'm open to ideas to improve them! |
There is a way to avoid caching but it's mighty unpleasant: pull the DB data, run a local postgres instance in docker, load the data. Then run query 1. Turn off docker. Turn on docker. Run query 2 & compare results. More details here: #3542 (comment) + https://pastebin.com/ijpEmVDs To be fair I think as long as you are sure the queries are functionally equivalent (give the same results) I'd just deploy it and observe the speed diff and revert if it doesn't work. |
Understatement for +1TB of data. Performing the same setup in AWS will also take a bunch 😭 |
Yeah, when no additional indexes are needed and confidence is reasonably high we can just temporarily deploy to prod for a bit. |
The trick is to only to only pull the tables you need. The biggest table (e.g. prices) you wouldn't need. Last time I did this I ended up with 10 gigs. |
0191b65 to
e1fcb81
Compare
|
Reminder: Please update the DB Readme and comment whether migrations are reversible (include rollback scripts if applicable).
Caused by: |
There was a problem hiding this comment.
Code Review
This pull request optimizes the total_surplus query by restructuring it to avoid ARRAY_AGG and adding covering indexes, which significantly improves performance as shown by the query plans. However, I've identified a critical pre-existing issue in the query logic where trades and auction executions are joined. This can lead to incorrect surplus calculations due to a Cartesian product. While this PR is an optimization, the underlying logic bug should be addressed.
squadgazzz
left a comment
There was a problem hiding this comment.
LGTM. The execution plan already shows some improvements. I assume the validation script is meant to only test the business logic, not performance, right? Since it warms up the cache before executing the optimized query.
| [CoWSwapEthFlow](https://github.com/cowprotocol/ethflowcontract/blob/main/src/CoWSwapEthFlow.sol) we actually deployed twice so events related to the staging environment should only show up in the staging DB and likewise for production. | ||
| It's also important to note that we only index events from blocks that we are certain will not get reorged. That means specifically that events will be indexed with a block delay of at least 64. | ||
|
|
||
| > For lessons learned on migrations, refer to the respective [Notion page](https://www.notion.so/cownation/Database-migration-learnings-2fb8da5f04ca8076bf05c433e461a139?utm_content=2fb8da5f-04ca-8076-bf05-c433e461a139&utm_campaign=T035UKY5NUB&pvs=6) |
There was a problem hiding this comment.
This page is private, which doesn't look right for an open-source repo.
Description
We've been experiencing latency spikes on several endpoints, we've pinned this down to the time it takes to acquire DB connections from the pool; when checking RDS monitoring, the surplus query always shows up at the top.
The current theory is that the query is a bit slower than it could be, as more users request the main swap page, their surplus is loaded (even if they don't request it — i.e. load the wallet modal) and if some of these users have a larger amount of orders, they're taking up connections that other endpoints aren't getting.
I looked into the user distribution, here are the results:
Distribution Query
However, if it was just this, it would be too simple. Depending on the user, they might have no orders and only onchain orders (note that the max number of onchain orders is around ~10k), some will have skewed data distributions across tables too, leading to analyzing and optimizing this query a bit tricky.
There are two crucial changes to the query — removing ARRAY_AGG and adding indexes — the first makes it so that the DB does not have to materialize a potentially big array in memory, which would otherwise lead to bad estimations too; the second provides better "access paths" to some of the information the query requires.
RDS Stats (1h)
Before
After
Changes
Query Plans
Before
After
How to test
Due to the fact that floating point addition is not commutative and the order specified in the old query is not deterministic (the ORDER BY uid is merely an approximation that matches) the validation script leaves some room for differences, 1e-9 to be precise.
Validation script
To validate the performance, I think it's best we give it a run in prod for anywhere from 30 minutes to 2 hours.
Even while requiring indexes, the new query should be faster.