Skip to content

fix(ai): serialize sentence-transformer encoding to prevent GPU races#1182

Open
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1169-sentence-transformer-concurrency
Open

fix(ai): serialize sentence-transformer encoding to prevent GPU races#1182
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1169-sentence-transformer-concurrency

Conversation

@ksaurabhAparavi

@ksaurabhAparavi ksaurabhAparavi commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Serialize both the wrapper encode() and raw shared-model access so concurrent inference on the shared NomicBert model does not race or trigger intermittent tensor size mismatches.
  • Adds a CUDA reproducer and focused regression coverage.

⚠️ Reviewer note

Conflict-resolved by keeping upstream's packages/ai/tests/conftest.py (the downstream commit's conftest additions were dropped). The core fix in sentence_transformers.py applied cleanly. Please confirm the added tests run under upstream's conftest in CI.

Testing

  • CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1169

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 61db1aaf-2a01-4c86-a0f7-5069281f9746

📥 Commits

Reviewing files that changed from the base of the PR and between efecb7e and bd4c627.

📒 Files selected for processing (4)
  • packages/ai/src/ai/common/models/transformers/sentence_transformers.py
  • packages/ai/tests/ai/common/models/transformers/__init__.py
  • packages/ai/tests/ai/common/models/transformers/reproduce_sentence_transformer_origin.py
  • packages/ai/tests/ai/common/models/transformers/test_sentence_transformers.py

📝 Walkthrough

Walkthrough

The PR adds thread serialization to SentenceTransformer.encode() local inference path via a mutex lock, preventing concurrent calls from interleaving preprocess/inference/postprocess operations. Includes a unit test verifying serialization and a standalone GPU reproducer script for manual testing.

Changes

Concurrent encode serialization

Layer / File(s) Summary
Serialization lock implementation
packages/ai/src/ai/common/models/transformers/sentence_transformers.py
threading module imported; self._encode_lock created in __init__; _encode_local() acquires lock around batched preprocess → inference → postprocess, serializing concurrent local encodes.
Unit test verification
packages/ai/tests/ai/common/models/transformers/test_sentence_transformers.py
Monkeypatches SentenceTransformer loader and pipeline; spawns concurrent encode() calls via ThreadPoolExecutor; asserts inference executes serially (max 1 simultaneous active call); validates NumPy array output shape (4, 1).
Manual reproducer and test infrastructure
packages/ai/tests/ai/common/models/transformers/__init__.py, reproduce_sentence_transformer_origin.py
Standalone GPU reproducer generates variable-length synthetic batches, logs encode events with thread/worker identity, and supports sequential/concurrent execution modes via CLI flags; test package init.py header added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐰 A lock on the encoder so fair,
No more shall the threads interfere,
Sequential encode, now pristine,
GPU tensors stay serene—
One thread at a time, we declare!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding serialization to sentence-transformer encoding to prevent GPU race conditions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the module:ai AI/ML modules label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

Serialize both the wrapper encode() and raw shared-model access so concurrent
inference on the shared NomicBert model does not race or trigger intermittent
tensor size mismatches during instance processing. Adds a CUDA reproducer and
focused regression coverage.

Fixes rocketride-org#1169
@ksaurabhAparavi ksaurabhAparavi force-pushed the fix/RR-1169-sentence-transformer-concurrency branch from bdedea6 to bd4c627 Compare June 8, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ai AI/ML modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrent shared sentence-transformer inference races (tensor size mismatches)

1 participant