refactor(profiling): wait for Sampling Thread to exit in Sampler::stop#16359
refactor(profiling): wait for Sampling Thread to exit in Sampler::stop#16359KowalskiThomas wants to merge 1 commit intomainfrom
Sampler::stop#16359Conversation
Codeowners resolved as |
a81409c to
c53245f
Compare
Sampler::stop
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c53245f474
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| // Wait for the sampling thread to actually exit (with timeout to avoid hanging forever) | ||
| std::unique_lock<std::mutex> lock(thread_exit_mutex); | ||
| constexpr auto timeout = std::chrono::seconds(3); | ||
| thread_exit_cv.wait_for(lock, timeout, [this]() { return !thread_running.load(); }); |
There was a problem hiding this comment.
Ensure stop waits if thread hasn't set running flag yet
The new wait in Sampler::stop() relies on thread_running being true when a sampling thread exists. If stop() is called immediately after start() (before the new thread reaches thread_running.store(true)), the predicate is already satisfied and wait_for returns immediately. The sampling thread can then start, run initialization (signal handler setup, loop checks), and touch profiler state while shutdown continues, which defeats the intended guarantee that stop() waits for the thread to exit. This race can occur during rapid start/stop sequences or process shutdown right after enabling profiling.
Useful? React with 👍 / 👎.
Performance SLOsComparing candidate kowalski/wait-for-sampling-thread-exit (c7f244f) with baseline main (9e014a2) 📈 Performance Regressions (3 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 504.407µs (SLO: <700.000µs 📉 -27.9%) vs baseline: 📈 +19.1% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ ospathbasename_noaspectTime: ✅ 432.089µs (SLO: <700.000µs 📉 -38.3%) vs baseline: -0.9% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ ospathjoin_aspectTime: ✅ 615.691µs (SLO: <700.000µs 📉 -12.0%) vs baseline: -0.7% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.1% ✅ ospathjoin_noaspectTime: ✅ 625.247µs (SLO: <700.000µs 📉 -10.7%) vs baseline: -0.2% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.9% ✅ ospathnormcase_aspectTime: ✅ 349.994µs (SLO: <700.000µs 📉 -50.0%) vs baseline: -2.5% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ ospathnormcase_noaspectTime: ✅ 360.144µs (SLO: <700.000µs 📉 -48.6%) vs baseline: -2.6% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ ospathsplit_aspectTime: ✅ 489.072µs (SLO: <700.000µs 📉 -30.1%) vs baseline: -0.5% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.2% ✅ ospathsplit_noaspectTime: ✅ 498.459µs (SLO: <700.000µs 📉 -28.8%) vs baseline: -0.5% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ ospathsplitdrive_aspectTime: ✅ 371.648µs (SLO: <700.000µs 📉 -46.9%) vs baseline: -1.9% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.3% ✅ ospathsplitdrive_noaspectTime: ✅ 72.889µs (SLO: <700.000µs 📉 -89.6%) vs baseline: -0.5% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ ospathsplitext_aspectTime: ✅ 462.144µs (SLO: <700.000µs 📉 -34.0%) vs baseline: +0.8% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.1% ✅ ospathsplitext_noaspectTime: ✅ 466.964µs (SLO: <700.000µs 📉 -33.3%) vs baseline: -0.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% 📈 iastaspectssplit - 12/12✅ rsplit_aspectTime: ✅ 169.941µs (SLO: <250.000µs 📉 -32.0%) vs baseline: 📈 +16.1% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7% ✅ rsplit_noaspectTime: ✅ 168.570µs (SLO: <250.000µs 📉 -32.6%) vs baseline: 📈 +10.6% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +4.8% ✅ split_aspectTime: ✅ 152.309µs (SLO: <250.000µs 📉 -39.1%) vs baseline: +5.5% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.1% ✅ split_noaspectTime: ✅ 164.404µs (SLO: <250.000µs 📉 -34.2%) vs baseline: +8.7% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +4.9% ✅ splitlines_aspectTime: ✅ 152.362µs (SLO: <250.000µs 📉 -39.1%) vs baseline: +4.7% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ splitlines_noaspectTime: ✅ 161.555µs (SLO: <250.000µs 📉 -35.4%) vs baseline: +7.0% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.540µs (SLO: <20.000µs 📉 -82.3%) vs baseline: 📈 +23.3% Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +5.1% ✅ 1-count-metrics-100-timesTime: ✅ 205.349µs (SLO: <220.000µs -6.7%) vs baseline: +2.3% Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.8% ✅ 1-distribution-metric-1-timesTime: ✅ 3.377µs (SLO: <20.000µs 📉 -83.1%) vs baseline: +5.7% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.6% ✅ 1-distribution-metrics-100-timesTime: ✅ 220.140µs (SLO: <230.000µs -4.3%) vs baseline: +3.2% Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +5.0% ✅ 1-gauge-metric-1-timesTime: ✅ 2.202µs (SLO: <20.000µs 📉 -89.0%) vs baseline: +1.5% Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +5.2% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.360µs (SLO: <150.000µs -8.4%) vs baseline: -0.2% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +5.0% ✅ 1-rate-metric-1-timesTime: ✅ 3.172µs (SLO: <20.000µs 📉 -84.1%) vs baseline: +4.7% Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.3% ✅ 1-rate-metrics-100-timesTime: ✅ 219.176µs (SLO: <250.000µs 📉 -12.3%) vs baseline: +2.3% Memory: ✅ 35.684MB (SLO: <38.000MB -6.1%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.589ms (SLO: <22.000ms -6.4%) vs baseline: +3.0% Memory: ✅ 35.743MB (SLO: <38.000MB -5.9%) vs baseline: +5.4% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.261ms (SLO: <2.550ms 📉 -11.3%) vs baseline: +1.1% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.9% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.399ms (SLO: <1.550ms -9.7%) vs baseline: -0.8% Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.7% ✅ 100-rate-metrics-100-timesTime: ✅ 2.225ms (SLO: <2.550ms 📉 -12.7%) vs baseline: +1.6% Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.9% ✅ flush-1-metricTime: ✅ 4.520µs (SLO: <20.000µs 📉 -77.4%) vs baseline: +1.4% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +5.0% ✅ flush-100-metricsTime: ✅ 172.809µs (SLO: <250.000µs 📉 -30.9%) vs baseline: ~same Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.6% ✅ flush-1000-metricsTime: ✅ 2.189ms (SLO: <2.500ms 📉 -12.4%) vs baseline: +0.4% Memory: ✅ 36.412MB (SLO: <38.750MB -6.0%) vs baseline: +4.9%
|
Sampler::stopSampler::stop
67806bd to
0c05dd5
Compare
This comment has been minimized.
This comment has been minimized.
r1viollet
left a comment
There was a problem hiding this comment.
LGTM
💭 would we want to have a warning in case we hit the timeout ?
0c05dd5 to
5626629
Compare
Good point, I added one! |
5626629 to
c7f244f
Compare
Description
https://datadoghq.atlassian.net/browse/PROF-13681
This PR updates the behaviour of the Sampling Thread in the Profiler after a discussion we had over Slack. Currently, the Sampling Thread is a daemon Thread and is never joined/waited for completion when the Python process exits. Unfortunately, this makes it near impossible to clean certain things up as the Sampling Thread might still be using them, and we have to assume it might still be using them forever (example: #16276).
Instead of
joining the Sampling Thread, this PR gives a grace period of 3 seconds to the Sampling Thread to exit after having incrementedthread_seq_numthen exits anyway. Under normal circumstances, this should be more than enough to exit (the Sampling Thread only samples, it doesn't upload, so it should take a few milliseconds at the very most per iteration); on the other hand if there's a logic bug that would lead to a deadlock if usingjoin, we will not be bitten by it as we have a more agile way to wait.