Conversation
Codeowners resolved as |
Performance SLOsComparing candidate kylev/async-dne (7dcbfda) with baseline main (bae5178) 📈 Performance Regressions (3 suites)📈 iastaspects - 116/116✅ add_aspectTime: ✅ 111.419µs (SLO: <130.000µs 📉 -14.3%) vs baseline: +5.7% Memory: ✅ 42.861MB (SLO: <46.000MB -6.8%) vs baseline: +4.7% ✅ add_inplace_aspectTime: ✅ 105.720µs (SLO: <130.000µs 📉 -18.7%) vs baseline: +0.8% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ add_inplace_noaspectTime: ✅ 28.224µs (SLO: <40.000µs 📉 -29.4%) vs baseline: ~same Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ add_noaspectTime: ✅ 49.098µs (SLO: <70.000µs 📉 -29.9%) vs baseline: -0.9% ✅ bytearray_aspectTime: ✅ 253.963µs (SLO: <400.000µs 📉 -36.5%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ bytearray_extend_aspectTime: ✅ 639.748µs (SLO: <800.000µs 📉 -20.0%) vs baseline: -0.3% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ bytearray_extend_noaspectTime: ✅ 269.197µs (SLO: <400.000µs 📉 -32.7%) vs baseline: -0.7% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ bytearray_noaspectTime: ✅ 139.776µs (SLO: <300.000µs 📉 -53.4%) vs baseline: -0.9% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ bytes_aspectTime: ✅ 217.823µs (SLO: <300.000µs 📉 -27.4%) vs baseline: -0.1% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ bytes_noaspectTime: ✅ 132.920µs (SLO: <200.000µs 📉 -33.5%) vs baseline: -0.6% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.1% ✅ bytesio_aspectTime: ✅ 3.884ms (SLO: <5.000ms 📉 -22.3%) vs baseline: -0.5% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8% ✅ bytesio_noaspectTime: ✅ 317.266µs (SLO: <420.000µs 📉 -24.5%) vs baseline: -0.3% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ capitalize_aspectTime: ✅ 89.135µs (SLO: <300.000µs 📉 -70.3%) vs baseline: +0.7% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ capitalize_noaspectTime: ✅ 254.089µs (SLO: <300.000µs 📉 -15.3%) vs baseline: +0.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ casefold_aspectTime: ✅ 88.542µs (SLO: <500.000µs 📉 -82.3%) vs baseline: +0.1% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ casefold_noaspectTime: ✅ 308.719µs (SLO: <500.000µs 📉 -38.3%) vs baseline: -1.0% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ decode_aspectTime: ✅ 90.986µs (SLO: <100.000µs -9.0%) vs baseline: +3.5% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ decode_noaspectTime: ✅ 153.328µs (SLO: <210.000µs 📉 -27.0%) vs baseline: +0.3% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8% ✅ encode_aspectTime: ✅ 85.577µs (SLO: <200.000µs 📉 -57.2%) vs baseline: +0.2% ✅ encode_noaspectTime: ✅ 138.207µs (SLO: <200.000µs 📉 -30.9%) vs baseline: +1.2% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ format_aspectTime: ✅ 14.687ms (SLO: <19.200ms 📉 -23.5%) vs baseline: ~same Memory: ✅ 43.214MB (SLO: <46.000MB -6.1%) vs baseline: +5.2% ✅ format_map_aspectTime: ✅ 16.551ms (SLO: <21.500ms 📉 -23.0%) vs baseline: +0.1% Memory: ✅ 43.155MB (SLO: <46.000MB -6.2%) vs baseline: +5.2% ✅ format_map_noaspectTime: ✅ 374.548µs (SLO: <500.000µs 📉 -25.1%) vs baseline: +0.3% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ format_noaspectTime: ✅ 310.065µs (SLO: <500.000µs 📉 -38.0%) vs baseline: -1.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ index_aspectTime: ✅ 126.423µs (SLO: <300.000µs 📉 -57.9%) vs baseline: +1.6% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.6% ✅ index_noaspectTime: ✅ 40.540µs (SLO: <300.000µs 📉 -86.5%) vs baseline: +0.3% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ join_aspectTime: ✅ 215.434µs (SLO: <300.000µs 📉 -28.2%) vs baseline: +0.6% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ join_noaspectTime: ✅ 142.559µs (SLO: <300.000µs 📉 -52.5%) vs baseline: -0.3% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ ljust_aspectTime: ✅ 510.387µs (SLO: <700.000µs 📉 -27.1%) vs baseline: +0.9% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.3% ✅ ljust_noaspectTime: ✅ 262.877µs (SLO: <300.000µs 📉 -12.4%) vs baseline: -0.3% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ lower_aspectTime: ✅ 339.834µs (SLO: <500.000µs 📉 -32.0%) vs baseline: 📈 +13.4% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ lower_noaspectTime: ✅ 234.865µs (SLO: <300.000µs 📉 -21.7%) vs baseline: -1.6% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7% ✅ lstrip_aspectTime: ✅ 0.269ms (SLO: <3.000ms 📉 -91.0%) vs baseline: -0.1% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ lstrip_noaspectTime: ✅ 0.177ms (SLO: <3.000ms 📉 -94.1%) vs baseline: -0.3% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ modulo_aspectTime: ✅ 14.330ms (SLO: <18.750ms 📉 -23.6%) vs baseline: ~same Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +4.9% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.864ms (SLO: <19.350ms 📉 -23.2%) vs baseline: -0.2% Memory: ✅ 43.155MB (SLO: <46.000MB -6.2%) vs baseline: +5.1% ✅ modulo_aspect_for_bytesTime: ✅ 14.460ms (SLO: <18.900ms 📉 -23.5%) vs baseline: ~same Memory: ✅ 43.077MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.684ms (SLO: <19.150ms 📉 -23.3%) vs baseline: +0.2% Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +5.2% ✅ modulo_noaspectTime: ✅ 0.375ms (SLO: <3.000ms 📉 -87.5%) vs baseline: -0.2% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ replace_aspectTime: ✅ 18.514ms (SLO: <24.000ms 📉 -22.9%) vs baseline: +0.4% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ replace_noaspectTime: ✅ 281.199µs (SLO: <300.000µs -6.3%) vs baseline: ~same Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ repr_aspectTime: ✅ 353.095µs (SLO: <420.000µs 📉 -15.9%) vs baseline: 📈 +13.3% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ repr_noaspectTime: ✅ 46.610µs (SLO: <90.000µs 📉 -48.2%) vs baseline: +0.2% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ rstrip_aspectTime: ✅ 380.559µs (SLO: <500.000µs 📉 -23.9%) vs baseline: -0.6% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.0% ✅ rstrip_noaspectTime: ✅ 186.352µs (SLO: <300.000µs 📉 -37.9%) vs baseline: +1.5% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ slice_aspectTime: ✅ 187.591µs (SLO: <300.000µs 📉 -37.5%) vs baseline: +0.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ slice_noaspectTime: ✅ 54.329µs (SLO: <90.000µs 📉 -39.6%) vs baseline: +0.2% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ stringio_aspectTime: ✅ 3.942ms (SLO: <5.000ms 📉 -21.2%) vs baseline: +0.3% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ stringio_noaspectTime: ✅ 344.411µs (SLO: <500.000µs 📉 -31.1%) vs baseline: -0.5% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ strip_aspectTime: ✅ 271.694µs (SLO: <350.000µs 📉 -22.4%) vs baseline: +1.2% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.1% ✅ strip_noaspectTime: ✅ 178.573µs (SLO: <240.000µs 📉 -25.6%) vs baseline: ~same Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ swapcase_aspectTime: ✅ 341.311µs (SLO: <500.000µs 📉 -31.7%) vs baseline: +0.7% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ swapcase_noaspectTime: ✅ 274.804µs (SLO: <400.000µs 📉 -31.3%) vs baseline: -0.7% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ title_aspectTime: ✅ 328.168µs (SLO: <500.000µs 📉 -34.4%) vs baseline: ~same Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ title_noaspectTime: ✅ 263.284µs (SLO: <400.000µs 📉 -34.2%) vs baseline: -0.6% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ translate_aspectTime: ✅ 493.341µs (SLO: <700.000µs 📉 -29.5%) vs baseline: ~same Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ translate_noaspectTime: ✅ 427.184µs (SLO: <500.000µs 📉 -14.6%) vs baseline: -1.1% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ upper_aspectTime: ✅ 339.151µs (SLO: <500.000µs 📉 -32.2%) vs baseline: 📈 +12.6% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ upper_noaspectTime: ✅ 235.431µs (SLO: <400.000µs 📉 -41.1%) vs baseline: +0.9% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 500.609µs (SLO: <700.000µs 📉 -28.5%) vs baseline: 📈 +18.2% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ ospathbasename_noaspectTime: ✅ 430.885µs (SLO: <700.000µs 📉 -38.4%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathjoin_aspectTime: ✅ 633.728µs (SLO: <700.000µs -9.5%) vs baseline: +1.0% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ ospathjoin_noaspectTime: ✅ 644.712µs (SLO: <700.000µs -7.9%) vs baseline: +0.6% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ ospathnormcase_aspectTime: ✅ 351.078µs (SLO: <700.000µs 📉 -49.8%) vs baseline: +0.4% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathnormcase_noaspectTime: ✅ 358.242µs (SLO: <700.000µs 📉 -48.8%) vs baseline: ~same Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 496.357µs (SLO: <700.000µs 📉 -29.1%) vs baseline: +0.4% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ ospathsplit_noaspectTime: ✅ 505.835µs (SLO: <700.000µs 📉 -27.7%) vs baseline: +0.1% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ ospathsplitdrive_aspectTime: ✅ 375.367µs (SLO: <700.000µs 📉 -46.4%) vs baseline: -0.5% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.5% ✅ ospathsplitdrive_noaspectTime: ✅ 74.393µs (SLO: <700.000µs 📉 -89.4%) vs baseline: +2.0% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ ospathsplitext_aspectTime: ✅ 457.471µs (SLO: <700.000µs 📉 -34.6%) vs baseline: ~same Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7% ✅ ospathsplitext_noaspectTime: ✅ 469.153µs (SLO: <700.000µs 📉 -33.0%) vs baseline: +1.1% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.431µs (SLO: <20.000µs 📉 -82.8%) vs baseline: 📈 +16.3% Memory: ✅ 35.448MB (SLO: <38.000MB -6.7%) vs baseline: +4.5% ✅ 1-count-metrics-100-timesTime: ✅ 208.725µs (SLO: <220.000µs -5.1%) vs baseline: +0.6% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.7% ✅ 1-distribution-metric-1-timesTime: ✅ 3.275µs (SLO: <20.000µs 📉 -83.6%) vs baseline: +0.3% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +5.0% ✅ 1-distribution-metrics-100-timesTime: ✅ 214.838µs (SLO: <230.000µs -6.6%) vs baseline: -1.0% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +5.1% ✅ 1-gauge-metric-1-timesTime: ✅ 2.154µs (SLO: <20.000µs 📉 -89.2%) vs baseline: +0.3% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +5.1% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.549µs (SLO: <150.000µs -9.0%) vs baseline: -0.6% Memory: ✅ 35.488MB (SLO: <38.000MB -6.6%) vs baseline: +5.0% ✅ 1-rate-metric-1-timesTime: ✅ 3.066µs (SLO: <20.000µs 📉 -84.7%) vs baseline: +0.4% Memory: ✅ 35.448MB (SLO: <38.000MB -6.7%) vs baseline: +4.7% ✅ 1-rate-metrics-100-timesTime: ✅ 221.239µs (SLO: <250.000µs 📉 -11.5%) vs baseline: +0.2% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.663ms (SLO: <22.000ms -6.1%) vs baseline: ~same Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.9% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.279ms (SLO: <2.550ms 📉 -10.6%) vs baseline: +1.0% Memory: ✅ 35.586MB (SLO: <38.000MB -6.4%) vs baseline: +4.8% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.403ms (SLO: <1.550ms -9.5%) vs baseline: ~same Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +5.0% ✅ 100-rate-metrics-100-timesTime: ✅ 2.239ms (SLO: <2.550ms 📉 -12.2%) vs baseline: -0.2% Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.7% ✅ flush-1-metricTime: ✅ 4.589µs (SLO: <20.000µs 📉 -77.1%) vs baseline: +0.5% Memory: ✅ 35.507MB (SLO: <38.000MB -6.6%) vs baseline: +4.7% ✅ flush-100-metricsTime: ✅ 176.625µs (SLO: <250.000µs 📉 -29.3%) vs baseline: -0.5% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.5% ✅ flush-1000-metricsTime: ✅ 2.188ms (SLO: <2.500ms 📉 -12.5%) vs baseline: -0.4% Memory: ✅ 36.412MB (SLO: <38.750MB -6.0%) vs baseline: +5.2% 🟡 Near SLO Breach (1 suite)🟡 tracer - 6/6✅ largeTime: ✅ 31.698ms (SLO: <32.950ms -3.8%) vs baseline: +0.3% Memory: ✅ 36.766MB (SLO: <39.250MB -6.3%) vs baseline: +5.0% ✅ mediumTime: ✅ 3.124ms (SLO: <3.200ms -2.4%) vs baseline: -0.3% Memory: ✅ 35.311MB (SLO: <38.750MB -8.9%) vs baseline: +5.1% ✅ smallTime: ✅ 366.054µs (SLO: <370.000µs 🟡 -1.1%) vs baseline: +3.9% Memory: ✅ 35.193MB (SLO: <38.750MB -9.2%) vs baseline: +4.9%
|
8f04d5d to
271c9c9
Compare
ddtrace/llmobs/_experiment.py
Outdated
| subset_dataset = self._get_subset_dataset(sample_size) | ||
| task_results: List[TaskResult] = [] | ||
| for idx, record in enumerate(subset_dataset): | ||
| result = await self._process_record_async((idx, record), run) |
There was a problem hiding this comment.
With the way this is written, the records are processed sequentially. Is this what we want? I would have expected that the records are processed concurrently, with the number of actively running coroutines limited to jobs. I would expect one reason people want an async version of run is they have high concurrency in mind.
This comment has been minimized.
This comment has been minimized.
ddtrace/llmobs/_experiment.py
Outdated
| def __init__(self, run_iteration: int, base_tags: Dict[str, str]): | ||
| self._id = uuid.uuid4() | ||
| # always increment the representation of iteration by 1 for readability | ||
| self._run_iteration = run_interation + 1 | ||
| self._run_iteration = run_iteration + 1 | ||
| self.tags = { | ||
| **base_tags, | ||
| "run_id": str(self._id), | ||
| "run_iteration": str(self._run_iteration), | ||
| } |
There was a problem hiding this comment.
If you're copying BaseTags, you may want to make it a Mapping instead of a dict.
There was a problem hiding this comment.
I actually just changed this to no longer have base tags in here... I solved it a different way. Global things stay in _tags on Experiment obj, run info has the run/iteration specific data only (run_id and run_iteration). A helper method forms the complete set of tags in places where we need to access them.
| :return: ExperimentResult containing evaluation results and metadata | ||
| """ | ||
| self._setup_experiment(jobs) | ||
| run_results = [] |
There was a problem hiding this comment.
| run_results = [] | |
| run_results: List[???] = [] |
Can you add the type hint maybe?
| err_stack = err_dict.get("stack") | ||
| err_type = err_dict.get("type") | ||
| if raise_errors and err_msg: | ||
| raise RuntimeError("Error on record {}: {}\n{}\n{}".format(result["idx"], err_msg, err_type, err_stack)) |
There was a problem hiding this comment.
Any reason not to use an f-string here?
|
|
||
| def _check_task_error(self, result: TaskResult, raise_errors: bool) -> None: | ||
| """Check for task errors and raise if configured.""" | ||
| err_dict = result.get("error") or {} |
There was a problem hiding this comment.
I'd recommend a type hint here
| tasks = [process_with_limit(idx, record) for idx, record in enumerate(subset_dataset)] | ||
| results = await asyncio.gather(*tasks) |
There was a problem hiding this comment.
| tasks = [process_with_limit(idx, record) for idx, record in enumerate(subset_dataset)] | |
| results = await asyncio.gather(*tasks) | |
| coros = [process_with_limit(idx, record) for idx, record in enumerate(subset_dataset)] | |
| results = await asyncio.gather(*coros) |
Are those really Tasks? I think they're just coroutines.
There was a problem hiding this comment.
good call, Claude used the name task but I would usually in the past have used the name coros. I think I've seen that in a lot of examples of async code in the past.
| } | ||
| output_data = None | ||
| try: | ||
| output_data = await self._task(input_data, self._config) # type: ignore[misc] |
There was a problem hiding this comment.
This doesn't look correct to me – self._task is a Callable, not an Awaitable (Task / Coroutine / something else) meaning it is not correct to await it (and that's why you had to add this type: ignore I'm guessing?)
I was wondering how somehow this wasn't showing in tests's type checking but the fact they receive llmobs as an untyped fixture hides that problem. Trying to run the same code with the "real" LLMObs thing, you get an error if you try to pass a coroutine.
I think this absolutely needs to be changed.
There was a problem hiding this comment.
yes I very much agree, I just realized this last night. The type hint for task does not fit the async case, and we have a directive for mypy to ignore that line in our code, but I think this should be handled differently. I was trying to get mypy to show errors in a test script I have where I try using local editable install of ddtrace -- it seems mypy is actually surprising permissive on allowing async function with similar signature where the type is Callable, but I'd rather have the correct type
There was a problem hiding this comment.
I was thinking of either having separate async_task argument to Experiment init, so the idea is the user will either only one of task and async_task.
Another thought is that task arg could accept either sync or async task, use inspect.iscoroutinefunction() to determine if it's async.
In either case, the user gets a friendly error if they call run() / arun() methods when it doesn't match the task type.
There was a problem hiding this comment.
Do you know what other APIs (e.g. Braintrust) do? Maybe there's a good pattern to re-use there. To be honest, I don't have a strong preference and it's an issue that I've encountered in the past (without a clear-cut solution).
inspect.iscoroutinefunction()
I remember I think at some point seeing someone use that and although it works, it gets somewhat hard to properly type the code when relying on this pattern.
having separate async_task argument
I think this is OK but again, typing this properly is somewhat complicated; I guess you'd have to use @override and make sure this stays maintained over time because it's easy to get wrong (also, relying on @override for typing makes everything more complicated because you have to maintain separate function declarations and definitions...)
Description
Add async task support to the Experiments SDK. Users can now provide an async task for an experiment and
arun()it.Testing
Follow a similar pattern as the synchronous case.