Fix flaky TestEventServer_GetJobSetEvents context budget#4925
Conversation
Greptile SummaryThis PR fixes flaky tests in
Confidence Score: 5/5This change only touches test code; no production logic is affected and the fix is structurally sound. Each subtest now creates its own independent 5-second context inside its t.Run closure, directly addressing the cumulative-budget exhaustion described in the PR. The table-driven refactor reduces duplication without introducing new test logic. The defer cancel() is correctly scoped to the closure, loop-variable capture is safe (subtests are sequential), and no logic regressions were found across the four _ErrorIfMissing and three _Permissions cases. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["TestEventServer_GetJobSetEvents_ErrorIfMissing"] --> B["for name, tc := range tests"]
B --> C["t.Run(name, ...)"]
C --> D["ctx, cancel := WithTimeout(5s) ← NEW per-subtest"]
D --> E["defer cancel()"]
E --> F["withEventServer(ctx, t, ...)"]
F --> G{"tc.publishEvent?"}
G -- yes --> H["reportPulsarEvent(ctx, ...)"]
G -- no --> I["GetJobSetEvents(...)"]
H --> I
I --> J{"tc.expectErrorCode == OK?"}
J -- yes --> K["require.NoError"]
J -- no --> L["assert gRPC error code"]
K --> M["assert len(stream.sendMessages)"]
L --> M
Reviews (8): Last reviewed commit: "Merge branch 'master' into fix-event-tes..." | Re-trigger Greptile |
4d0ef99 to
6e298b5
Compare
6e298b5 to
554ac0d
Compare
e336956 to
f0afe52
Compare
The two tests shared a single 5s armadacontext.WithTimeout across all their subtests. Each subtest spins up a fresh lookout DB and runs every migration via withEventServer and WithLookoutDb, so the per-subtest cost is not trivial. Under CI load on a slow runner the cumulative cost blew the budget, surfacing as context deadline exceeded inside CreateQueue. Two changes here: - Each subtest now gets its own 5s budget rather than sharing one. Adding new cases no longer tightens the envelope for existing ones. - Both tests converted to map-based table-driven, with the shared event-publishing boilerplate extracted into a jobRunAssignedEvent helper. Net -108 lines, same coverage. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
f0afe52 to
aa1da3a
Compare
`TestEventServer_GetJobSetEvents_ErrorIfMissing` (4 subtests) and `TestEventServer_GetJobSetEvents_Permissions` (3 subtests) share a single 5s `armadacontext.WithTimeout` across all their subtests. Each subtest spins up a fresh lookout DB and runs every migration via `withEventServer` and `WithLookoutDb`, so the per-subtest cost is not trivial. On a slow CI runner the cumulative cost blew the 5s budget. We hit this on PR #4920 where 4 subtests of `_ErrorIfMissing` took 5.74s combined and the last one died with `context deadline exceeded` inside `PostgresQueueRepository.upsertQueue`. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com> Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
TestEventServer_GetJobSetEvents_ErrorIfMissing(4 subtests) andTestEventServer_GetJobSetEvents_Permissions(3 subtests) share a single 5sarmadacontext.WithTimeoutacross all their subtests. Each subtest spins up a fresh lookout DB and runs every migration viawithEventServerandWithLookoutDb, so the per-subtest cost is not trivial.On a slow CI runner the cumulative cost blew the 5s budget. We hit this on PR #4920 where 4 subtests of
_ErrorIfMissingtook 5.74s combined and the last one died withcontext deadline exceededinsidePostgresQueueRepository.upsertQueue.