Skip to content

feat(spider-storage): Add ServerRuntime to gracefully shutdown storage service.#324

Open
sitaowang1998 wants to merge 8 commits into
y-scope:storage-service-devfrom
sitaowang1998:graceful-shutdown
Open

feat(spider-storage): Add ServerRuntime to gracefully shutdown storage service.#324
sitaowang1998 wants to merge 8 commits into
y-scope:storage-service-devfrom
sitaowang1998:graceful-shutdown

Conversation

@sitaowang1998
Copy link
Copy Markdown
Collaborator

@sitaowang1998 sitaowang1998 commented May 14, 2026

Description

This PR:

  • Adds a new ServerRuntime type that keeps a CancellationToken and JoinHandle to gracefully shutdown the service.
  • Merges ExecutionManagerLivenessStore in task instance pool into ExecutionManagerLivenessManagement in db.
  • Adds graceful shutdown in task instance pool that drains all messages.
  • Refactors task instance pool handle creation into a factory method.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • New tests added for the graceful shutdown and task instance pool factory method.
  • GitHub workflows pass.

Summary by CodeRabbit

Release Notes

  • Chores

    • Updated Tokio dependencies to latest compatible versions for improved stability
  • Refactor

    • Restructured task instance pool API with new configuration options
    • Introduced server runtime abstraction for better background task coordination
    • Enhanced error handling for database operations in cache layer

Review Change Stack

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner May 14, 2026 19:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f03345bd-d79a-4a2c-acd7-974a910d20ea

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/spider-storage/src/state/server.rs`:
- Around line 102-109: The timeout currently moves
self.task_instance_pool_join_handle into tokio::time::timeout so if the timeout
fires the handle is dropped and the background task keeps running; change the
logic to detect timeout without consuming the handle (e.g., use tokio::select!
between the join handle future and tokio::time::sleep) so you can call abort()
on the JoinHandle (self.task_instance_pool_join_handle.abort()) when the timeout
branch wins, then return the StorageServerError::Stopping error; reference the
existing stop_timeout_sec and task_instance_pool_join_handle symbols and ensure
you await or ignore the aborted handle appropriately after aborting.

In `@components/spider-storage/src/task_instance_pool.rs`:
- Around line 169-170: Clamp the public config fields channel_size and
gc_interval to at least 1 before creating Tokio primitives to avoid panics: when
calling mpsc::channel(...) use config.channel_size.max(1) and when creating the
interval with tokio::time::interval(...) use config.gc_interval.max(1) (or
assign clamped values to locals first). Update the TaskInstancePool
construction/initialization sites (references: TaskInstancePoolConfig,
mpsc::channel, tokio::time::interval, variables sender/receiver and the GC
interval setup) to pass the clamped values so zero cannot be forwarded into
Tokio APIs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 71fa0b19-7bcb-4d56-96c9-4681b1e1b273

📥 Commits

Reviewing files that changed from the base of the PR and between dfb2170 and 5b8eccf.

📒 Files selected for processing (5)
  • components/spider-storage/Cargo.toml
  • components/spider-storage/src/cache/error.rs
  • components/spider-storage/src/state.rs
  • components/spider-storage/src/state/server.rs
  • components/spider-storage/src/task_instance_pool.rs

Comment thread components/spider-storage/src/state/server.rs Outdated
Comment thread components/spider-storage/src/task_instance_pool.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant