Add developer doc for job lifecycle events#4935
Open
dejanzele wants to merge 3 commits into
Open
Conversation
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
Contributor
44bcedb to
7903765
Compare
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
7903765 to
09a154e
Compare
…nversion note in job-lifecycle doc Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a developer reference for the events and state transitions across a job run's lifecycle.
The doc covers the multi-cluster topology and the two transports (Pulsar and the gRPC lease stream) that carry events between the control plane and executors, the job-level and run-level state machines, and the internal proto event vocabulary alongside its mapping to the external API event vocabulary. It then walks through step-by-step flows for the four terminal cases: succeeded, failed (both organic terminal-phase and executor-issue-handler paths), preempted, and cancelled.
The preempt section documents the current double-emission behavior. Each preempted run produces two
JobPreemptedEventmessages on the external stream and an overwrite in thejob_run_errorsrow that replaces the scheduler's preemption description with the executor's generic "Run preempted" text. Operators integrating with the event stream need to know about this.