feat(spider-task-executor): Add executor binary with bincode wire protocol and integration tests.#325
feat(spider-task-executor): Add executor binary with bincode wire protocol and integration tests.#325LinZhihao-723 wants to merge 3 commits into
Conversation
WalkthroughThis PR introduces a new spider-task-executor subprocess that executes TDL tasks via framed bincode IPC, with serializable error types, dynamic package loading, test task definitions, and a complete integration test suite with latency benchmarking. ChangesTask Executor Subprocess
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/huntsman/integration-test-tasks/src/lib.rs (1)
17-17: 💤 Low valueConsider the reliability of a 50-microsecond sleep for benchmarking.
The
INSTRUMENT_SLEEP_USconstant sets a 50-microsecond sleep, which is quite short. On Linux,sleep()with sub-millisecond durations may be subject to scheduler granularity and could have higher variance. Since this is used for overhead measurement in benchmark tests, ensure the results account for potential timing imprecision.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/huntsman/integration-test-tasks/src/lib.rs` at line 17, The 50µs constant INSTRUMENT_SLEEP_US is too short and may suffer scheduler jitter; to fix, increase it to a more reliable value (e.g., 1000 or 5000 µs) or make it configurable so tests can select a stable duration (via an environment variable or test flag) and update any places using INSTRUMENT_SLEEP_US to read the configurable value; ensure the constant name and usages (INSTRUMENT_SLEEP_US) are adjusted and document the change in the test notes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/huntsman/integration-test-tasks/src/lib.rs`:
- Line 17: The 50µs constant INSTRUMENT_SLEEP_US is too short and may suffer
scheduler jitter; to fix, increase it to a more reliable value (e.g., 1000 or
5000 µs) or make it configurable so tests can select a stable duration (via an
environment variable or test flag) and update any places using
INSTRUMENT_SLEEP_US to read the configurable value; ensure the constant name and
usages (INSTRUMENT_SLEEP_US) are adjusted and document the change in the test
notes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8d75d100-9dd1-4402-8d2a-b879ab41f55d
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
Cargo.tomlcomponents/spider-task-executor/Cargo.tomlcomponents/spider-task-executor/src/bin/spider_task_executor.rscomponents/spider-task-executor/src/error.rscomponents/spider-task-executor/src/lib.rscomponents/spider-task-executor/src/manager.rscomponents/spider-task-executor/src/protocol.rstaskfiles/test.yamltests/huntsman/integration-test-tasks/Cargo.tomltests/huntsman/integration-test-tasks/src/lib.rstests/huntsman/task-executor/Cargo.tomltests/huntsman/task-executor/src/lib.rstests/huntsman/task-executor/tests/executor.rstests/huntsman/task-executor/tests/overhead_instrument.rstests/huntsman/tdl-integration/tests/complex.rs
Description
Wire protocol (
spider-task-executor/src/protocol.rs)A new
protocolmodule on thespider-task-executorlibrary defines the three wire types the execution manager will use to drive the child:Request::Execute { tdl_context, raw_ctx, raw_inputs }andRequest::Shutdown. The parent owns the hard timeout in its entirety; the executor has no notion of timeouts and the request carries no deadline.Response::Result { outcome, elapsed_us }— exactly one perExecute.elapsed_usis the in-FFI wall-clock measured by the executor and is what the overhead instrument uses to separate executor-side cost from parent-side IPC.ExecutorOutcome::Success { outputs } | Failure { error }—outputsis the wire-formatTaskOutputsSerializerbuffer ready to forward to storage;erroris the msgpack-encodedExecutorError.Stderr is not carried over the protocol; how the spawner disposes of the executor's stderr (inherit / pipe / log file) is a parent-side decision.
Executor binary (
spider-task-executor/src/bin/spider_task_executor.rs)Single-threaded tokio runtime; requests are processed strictly sequentially with exactly one task running for the lifetime of the process. Tokio is here only to match the async I/O surface the execution manager uses (
tokio_util::codec::LengthDelimitedCodec); the executor itself has no concurrency requirements.Key shape decisions:
oneshot, notokio::select!. The previous design dispatched the FFI on astd::threadso the runtime couldselect!an in-process timer against the FFI; with the timer responsibility consolidated on the parent, none of that scaffolding earns its keep.SPIDER_TDL_PACKAGE_DIRis validated once at startup. If unset the binary exits non-zero before processing any request, which surfaces a deployment misconfiguration immediately rather than per-request.${SPIDER_TDL_PACKAGE_DIR}/<package>/lib<package>.so. The first request for a package dlopens the library; subsequent requests reuse the cachedTdlPackage.tracingandtracing-subscriberare pulled in withdefault-features = falseand only the features actually used (stdfor the macros;fmt,env-filter,jsonfor the subscriber).ExecutorErroris now wire-friendlyExecutorErrorderivesserde::Serialize/Deserializeso the binary can ship it across the protocol asFailure { error: rmp_serde::to_vec(&err) }and the EM can decode it back to a typed value. Three variants used to wrap external types that don't implementSerialize(libloading::Error,std::str::Utf8Error,rmp_serde::decode::Error); they now carry theDisplayrendering of the source error as aString. ExplicitFromimpls preserve the lossless?propagation inmanager.rs. The wildcardmatches!(err, ExecutorError::InvalidLibrary(_))pattern in the existinghuntsman/tdl-integrationtests still compiles unchanged.Integration test crates
tests/huntsman/integration-test-tasks(TDL package)A cdylib + rlib package (
crate-type = ["cdylib", "rlib"]) registered under TDL package nameintegration_test_tasks. The dual crate-type lets the bench reference compile-time constants (notablyINSTRUMENT_SLEEP_US) while keeping the cdylib fordlopenfrom the executor.Tasks:
fibonacci(index: u64) -> u64— naive recursion; correctness check.always_fail() -> Result— returnsTdlError::ExecutionError.always_panic() -> !— panics; the panic crosses theextern "C"FFI boundary and aborts the process, which is exactly the crash signal the parent test asserts on.instrument(items: Vec<String>) -> Vec<String>— sleeps for a fixedINSTRUMENT_SLEEP_US(50µs) and echoes the payload back. Used by the overhead bench.tests/huntsman/task-executor(executor integration tests)A library crate that provides an
ExecutorHandleharness (spawn the binary, frame requests on stdin, decode responses from stdout) and two[[test]]binaries.The harness panics with descriptive
.expect(...)messages on protocol / I/O / decode failures rather than threading errors through every helper — these are infrastructure, not production code, and a panic with backtrace points at the failure site immediately. (This pattern surfaced one subtle bug during development: a stale.soleft over from a task rename producedTaskNotFound("instrument")in the panic message verbatim, which was the entire diagnosis.)tests/executor.rscovers:fibonacci_returns_correct_value— encodes a singleu64input; assertsSuccessand that the decodedu64equals 55.always_fail_reports_task_error— assertsFailurewhose msgpack-decodedExecutorErrorisTaskError(TdlError::ExecutionError(_))and whose message contains the task name.always_panic_crashes_the_process— sendsExecute, expects stdout EOF before any frame arrives, then waits for the child to exit non-zero.tests/overhead_instrument.rsruns theinstrumenttask ten times against a long-lived executor (so dlopen happens once during a discarded warm-up) and writes a markdown table at${SPIDER_TEST_INSTRUMENT_OUTPUT_DIR}/task_executor_overhead.md. With the work portion held constant at 50µs the table separates four metrics:E2E (parent)Instant-to-Instantaroundsend(Execute)→recv(Response::Result)Executor FFIelapsed_usreported by the executor (sleep + in-FFI input/output serde)Executor internal (FFI - sleep)IPC overhead (E2E - FFI)Sample output from a local run:
Taskfile changes (
taskfiles/test.yaml)spider-huntsman-unit-tests-executornow:cargo buildinvocations (combining--package <cdylib>with--bin <name>would silently exclude the cdylibs from the target selection):huntsman-complex,integration-test-tasks, andspider-task-executor --bin spider-task-executor.build/tdl_packages/<package>/lib<package>.so— the standard layout the executor binary reads via${SPIDER_TDL_PACKAGE_DIR}/<package>/lib<package>.so.SPIDER_TDL_PACKAGE_DIR,SPIDER_TASK_EXECUTOR_BIN, and (relocates)SPIDER_TDL_PACKAGE_COMPLEXto point at the staged paths. Existinghuntsman-complexconsumers see no behavior change because their tests load the.soby absolute path.cargo nextest run --all --all-features --run-ignored all --release. The new tests are gated#[ignore]so plaincargo test(which doesn't go through the taskfile) can't accidentally try to run them without env vars set.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
Tests