fix(tracer): resolve issue with not flushing during SIGTERM and SIGINT#16342
Conversation
|
This change is marked for backport to 4.3 and it does not conflict with that branch. |
…m.sigint.tracer.handler
Performance SLOsComparing candidate APMLP-1002/fix.sigterm.sigint.tracer.handler (a6bc9f4) with baseline main (44f85b2) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 105.196µs (SLO: <130.000µs 📉 -19.1%) vs baseline: +3.1% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.5% ✅ add_inplace_aspectTime: ✅ 102.263µs (SLO: <130.000µs 📉 -21.3%) vs baseline: -0.7% Memory: ✅ 43.077MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ add_inplace_noaspectTime: ✅ 28.094µs (SLO: <40.000µs 📉 -29.8%) vs baseline: -0.1% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +4.8% ✅ add_noaspectTime: ✅ 48.825µs (SLO: <70.000µs 📉 -30.2%) vs baseline: -0.3% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.3% ✅ bytearray_aspectTime: ✅ 248.422µs (SLO: <400.000µs 📉 -37.9%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.5% ✅ bytearray_extend_aspectTime: ✅ 635.355µs (SLO: <800.000µs 📉 -20.6%) vs baseline: +1.0% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ bytearray_extend_noaspectTime: ✅ 263.312µs (SLO: <400.000µs 📉 -34.2%) vs baseline: +0.9% Memory: ✅ 43.214MB (SLO: <46.000MB -6.1%) vs baseline: +5.2% ✅ bytearray_noaspectTime: ✅ 139.471µs (SLO: <300.000µs 📉 -53.5%) vs baseline: -0.2% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.6% ✅ bytes_aspectTime: ✅ 216.711µs (SLO: <300.000µs 📉 -27.8%) vs baseline: -0.2% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.0% ✅ bytes_noaspectTime: ✅ 132.899µs (SLO: <200.000µs 📉 -33.6%) vs baseline: -0.4% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.9% ✅ bytesio_aspectTime: ✅ 3.902ms (SLO: <5.000ms 📉 -22.0%) vs baseline: -0.2% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ bytesio_noaspectTime: ✅ 312.240µs (SLO: <420.000µs 📉 -25.7%) vs baseline: -1.1% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ capitalize_aspectTime: ✅ 89.679µs (SLO: <300.000µs 📉 -70.1%) vs baseline: +0.4% Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +5.3% ✅ capitalize_noaspectTime: ✅ 268.111µs (SLO: <300.000µs 📉 -10.6%) vs baseline: +3.4% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.9% ✅ casefold_aspectTime: ✅ 89.453µs (SLO: <500.000µs 📉 -82.1%) vs baseline: +0.7% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ casefold_noaspectTime: ✅ 311.724µs (SLO: <500.000µs 📉 -37.7%) vs baseline: +0.6% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ decode_aspectTime: ✅ 88.431µs (SLO: <100.000µs 📉 -11.6%) vs baseline: ~same Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +5.1% ✅ decode_noaspectTime: ✅ 152.586µs (SLO: <210.000µs 📉 -27.3%) vs baseline: +0.7% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.7% ✅ encode_aspectTime: ✅ 85.002µs (SLO: <200.000µs 📉 -57.5%) vs baseline: -0.2% Memory: ✅ 43.175MB (SLO: <46.000MB -6.1%) vs baseline: +5.4% ✅ encode_noaspectTime: ✅ 140.260µs (SLO: <200.000µs 📉 -29.9%) vs baseline: -0.7% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.3% ✅ format_aspectTime: ✅ 14.592ms (SLO: <19.200ms 📉 -24.0%) vs baseline: ~same Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +4.7% ✅ format_map_aspectTime: ✅ 16.364ms (SLO: <21.500ms 📉 -23.9%) vs baseline: ~same Memory: ✅ 43.273MB (SLO: <46.000MB -5.9%) vs baseline: +5.3% ✅ format_map_noaspectTime: ✅ 370.423µs (SLO: <500.000µs 📉 -25.9%) vs baseline: +1.6% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ format_noaspectTime: ✅ 307.391µs (SLO: <500.000µs 📉 -38.5%) vs baseline: +0.3% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.1% ✅ index_aspectTime: ✅ 127.178µs (SLO: <300.000µs 📉 -57.6%) vs baseline: +2.4% Memory: ✅ 43.077MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ index_noaspectTime: ✅ 40.488µs (SLO: <300.000µs 📉 -86.5%) vs baseline: +0.3% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +4.8% ✅ join_aspectTime: ✅ 215.522µs (SLO: <300.000µs 📉 -28.2%) vs baseline: +0.1% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +4.5% ✅ join_noaspectTime: ✅ 146.930µs (SLO: <300.000µs 📉 -51.0%) vs baseline: +0.3% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +4.6% ✅ ljust_aspectTime: ✅ 562.143µs (SLO: <700.000µs 📉 -19.7%) vs baseline: 📈 +14.7% Memory: ✅ 43.283MB (SLO: <46.000MB -5.9%) vs baseline: +5.1% ✅ ljust_noaspectTime: ✅ 251.031µs (SLO: <300.000µs 📉 -16.3%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ lower_aspectTime: ✅ 298.550µs (SLO: <500.000µs 📉 -40.3%) vs baseline: +0.6% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ lower_noaspectTime: ✅ 234.803µs (SLO: <300.000µs 📉 -21.7%) vs baseline: -0.6% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.3% ✅ lstrip_aspectTime: ✅ 0.278ms (SLO: <3.000ms 📉 -90.7%) vs baseline: -1.3% Memory: ✅ 43.077MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ lstrip_noaspectTime: ✅ 0.175ms (SLO: <3.000ms 📉 -94.2%) vs baseline: -1.3% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ modulo_aspectTime: ✅ 14.345ms (SLO: <18.750ms 📉 -23.5%) vs baseline: ~same Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +5.1% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.744ms (SLO: <19.350ms 📉 -23.8%) vs baseline: -0.2% Memory: ✅ 43.225MB (SLO: <46.000MB -6.0%) vs baseline: +5.1% ✅ modulo_aspect_for_bytesTime: ✅ 14.406ms (SLO: <18.900ms 📉 -23.8%) vs baseline: +0.4% Memory: ✅ 43.284MB (SLO: <46.000MB -5.9%) vs baseline: +5.4% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.572ms (SLO: <19.150ms 📉 -23.9%) vs baseline: +0.1% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.6% ✅ modulo_noaspectTime: ✅ 0.366ms (SLO: <3.000ms 📉 -87.8%) vs baseline: +2.0% Memory: ✅ 43.155MB (SLO: <46.000MB -6.2%) vs baseline: +5.2% ✅ replace_aspectTime: ✅ 18.400ms (SLO: <24.000ms 📉 -23.3%) vs baseline: +0.4% Memory: ✅ 43.176MB (SLO: <46.000MB -6.1%) vs baseline: +5.0% ✅ replace_noaspectTime: ✅ 279.178µs (SLO: <300.000µs -6.9%) vs baseline: ~same Memory: ✅ 43.136MB (SLO: <46.000MB -6.2%) vs baseline: +5.1% ✅ repr_aspectTime: ✅ 319.947µs (SLO: <420.000µs 📉 -23.8%) vs baseline: +0.4% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.4% ✅ repr_noaspectTime: ✅ 46.929µs (SLO: <90.000µs 📉 -47.9%) vs baseline: +0.6% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ rstrip_aspectTime: ✅ 377.749µs (SLO: <500.000µs 📉 -24.5%) vs baseline: +0.9% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.0% ✅ rstrip_noaspectTime: ✅ 181.017µs (SLO: <300.000µs 📉 -39.7%) vs baseline: -0.7% Memory: ✅ 43.136MB (SLO: <46.000MB -6.2%) vs baseline: +4.7% ✅ slice_aspectTime: ✅ 182.434µs (SLO: <300.000µs 📉 -39.2%) vs baseline: ~same Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.7% ✅ slice_noaspectTime: ✅ 54.017µs (SLO: <90.000µs 📉 -40.0%) vs baseline: +0.6% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.4% ✅ stringio_aspectTime: ✅ 4.470ms (SLO: <5.000ms 📉 -10.6%) vs baseline: 📈 +13.2% Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +4.8% ✅ stringio_noaspectTime: ✅ 348.164µs (SLO: <500.000µs 📉 -30.4%) vs baseline: +0.6% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.5% ✅ strip_aspectTime: ✅ 277.687µs (SLO: <350.000µs 📉 -20.7%) vs baseline: -0.6% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ strip_noaspectTime: ✅ 178.042µs (SLO: <240.000µs 📉 -25.8%) vs baseline: +0.6% Memory: ✅ 43.077MB (SLO: <46.000MB -6.4%) vs baseline: +4.6% ✅ swapcase_aspectTime: ✅ 338.972µs (SLO: <500.000µs 📉 -32.2%) vs baseline: ~same Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.9% ✅ swapcase_noaspectTime: ✅ 275.531µs (SLO: <400.000µs 📉 -31.1%) vs baseline: -0.8% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ title_aspectTime: ✅ 329.995µs (SLO: <500.000µs 📉 -34.0%) vs baseline: +0.5% Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +4.9% ✅ title_noaspectTime: ✅ 262.786µs (SLO: <400.000µs 📉 -34.3%) vs baseline: +0.5% Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +4.9% ✅ translate_aspectTime: ✅ 490.898µs (SLO: <700.000µs 📉 -29.9%) vs baseline: -1.2% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +4.6% ✅ translate_noaspectTime: ✅ 424.659µs (SLO: <500.000µs 📉 -15.1%) vs baseline: -1.0% Memory: ✅ 43.136MB (SLO: <46.000MB -6.2%) vs baseline: +5.3% ✅ upper_aspectTime: ✅ 298.053µs (SLO: <500.000µs 📉 -40.4%) vs baseline: +0.6% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.4% ✅ upper_noaspectTime: ✅ 234.425µs (SLO: <400.000µs 📉 -41.4%) vs baseline: -1.5% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.2% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 488.176µs (SLO: <700.000µs 📉 -30.3%) vs baseline: 📈 +13.9% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ ospathbasename_noaspectTime: ✅ 431.893µs (SLO: <700.000µs 📉 -38.3%) vs baseline: ~same Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.3% ✅ ospathjoin_aspectTime: ✅ 613.910µs (SLO: <700.000µs 📉 -12.3%) vs baseline: ~same Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.1% ✅ ospathjoin_noaspectTime: ✅ 620.322µs (SLO: <700.000µs 📉 -11.4%) vs baseline: -0.3% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ ospathnormcase_aspectTime: ✅ 355.466µs (SLO: <700.000µs 📉 -49.2%) vs baseline: +0.4% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ ospathnormcase_noaspectTime: ✅ 359.866µs (SLO: <700.000µs 📉 -48.6%) vs baseline: -0.6% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ ospathsplit_aspectTime: ✅ 486.269µs (SLO: <700.000µs 📉 -30.5%) vs baseline: -0.5% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ ospathsplit_noaspectTime: ✅ 497.524µs (SLO: <700.000µs 📉 -28.9%) vs baseline: -0.3% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ ospathsplitdrive_aspectTime: ✅ 376.900µs (SLO: <700.000µs 📉 -46.2%) vs baseline: +0.2% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ ospathsplitdrive_noaspectTime: ✅ 72.940µs (SLO: <700.000µs 📉 -89.6%) vs baseline: -1.2% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ ospathsplitext_aspectTime: ✅ 468.314µs (SLO: <700.000µs 📉 -33.1%) vs baseline: +1.1% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.0% ✅ ospathsplitext_noaspectTime: ✅ 469.047µs (SLO: <700.000µs 📉 -33.0%) vs baseline: +0.5% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.348µs (SLO: <20.000µs 📉 -83.3%) vs baseline: 📈 +15.2% Memory: ✅ 35.625MB (SLO: <38.000MB -6.2%) vs baseline: +5.1% ✅ 1-count-metrics-100-timesTime: ✅ 200.521µs (SLO: <220.000µs -8.9%) vs baseline: -0.3% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.8% ✅ 1-distribution-metric-1-timesTime: ✅ 3.217µs (SLO: <20.000µs 📉 -83.9%) vs baseline: -0.2% Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.3% ✅ 1-distribution-metrics-100-timesTime: ✅ 211.482µs (SLO: <230.000µs -8.1%) vs baseline: -1.0% Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +4.7% ✅ 1-gauge-metric-1-timesTime: ✅ 2.190µs (SLO: <20.000µs 📉 -89.1%) vs baseline: +0.3% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.6% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.178µs (SLO: <150.000µs -9.2%) vs baseline: -0.9% Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.2% ✅ 1-rate-metric-1-timesTime: ✅ 3.019µs (SLO: <20.000µs 📉 -84.9%) vs baseline: -0.7% Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.6% ✅ 1-rate-metrics-100-timesTime: ✅ 214.757µs (SLO: <250.000µs 📉 -14.1%) vs baseline: +0.4% Memory: ✅ 35.606MB (SLO: <38.000MB -6.3%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.426ms (SLO: <22.000ms -7.2%) vs baseline: +0.6% Memory: ✅ 35.645MB (SLO: <38.000MB -6.2%) vs baseline: +4.6% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.280ms (SLO: <2.550ms 📉 -10.6%) vs baseline: +0.7% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.7% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.397ms (SLO: <1.550ms -9.8%) vs baseline: ~same Memory: ✅ 35.665MB (SLO: <38.000MB -6.1%) vs baseline: +5.0% ✅ 100-rate-metrics-100-timesTime: ✅ 2.240ms (SLO: <2.550ms 📉 -12.2%) vs baseline: +1.4% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.7% ✅ flush-1-metricTime: ✅ 4.479µs (SLO: <20.000µs 📉 -77.6%) vs baseline: +0.3% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.6% ✅ flush-100-metricsTime: ✅ 174.397µs (SLO: <250.000µs 📉 -30.2%) vs baseline: -0.1% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.6% ✅ flush-1000-metricsTime: ✅ 2.184ms (SLO: <2.500ms 📉 -12.6%) vs baseline: -0.2% Memory: ✅ 36.412MB (SLO: <38.750MB -6.0%) vs baseline: +4.9%
|
Codeowners resolved as |
emmettbutler
left a comment
There was a problem hiding this comment.
Cool. Learned some stuff by reviewing this.
bb8d0a7 to
f644eee
Compare
This reverts commit dcae8eb.
|
I am having issues where some of the cases are failing but only in CI, and they are failing with weird messages "400: list index out of range", meaning something is maybe up with the test agent being able to properly read/resolve the snapshot data (best guess so far) |
This comment has been minimized.
This comment has been minimized.
Seems like the NativeWriter takes much longer than the AgentWriter to startup, especially when stats is enabled, so we might not properly initialize the NativeWriter before we send the SIGTERM/SIGINT signal. I added a small sleep, which I don't love, but makes the results more consistent it seems. |
|
I still need to resolve an issue where the appsec integrations flask tests seem to be timing out/deadlocking... |
This reverts commit 5d744b7.
…m.sigint.tracer.handler
|
This change is marked for backport to 4.4 and it does not conflict with that branch. |
9035145 to
0f2bf18
Compare
…m.sigint.tracer.handler
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
aa0794c
into
main
Description
Currently we do not properly register SIGTERM and SIGINT handlers for Tracer._atexit. This means any buffered traces or stats payloads may be lost during SIGTERM or SIGINT.
The impact on traces is up to 1 seconds worth of traces could be lost. For stats it is up to 10 seconds of stats which could be lost.
There are three changes here:
atexit.register_on_exit_signalforTracer._atexita. Or call whatever custom/user signals were registered before ours, defaulting to the default handlers if none were.
a. from first -> last, to last -> first so we can ensure the default signal handling behavior happens after all of our signals are called
Testing
Regression test added to ensure that all trace/stats writer configurations properly flush on exit.
We set a really high writer flush interval for traces to ensure we maintain the traces in the buffer while we trigger a SIGTERM/SIGINT.
Risks
Additional Notes