fix(ai): fall back to CPU when CUDA kernels are incompatible with the device by ksaurabhAparavi · Pull Request #1181 · rocketride-org/rocketride-server

ksaurabhAparavi · 2026-06-08T10:21:19Z

Summary

Add probe_cuda() to ai.common.torch (tiny GEMM + synchronize) to detect cudaErrorNoKernelImageForDevice at model-load time instead of asynchronously on first inference.
Local-mode loaders (transformers, sentence_transformers, vision, gliner, easyocr, trocr, surya, doctr) probe before committing to GPU and fall back to CPU; whisper retries on CPU in its except block.

⚠️ Reviewer note

Conflict-resolved against upstream in surya.py: upstream's # contract-check: ignore import comments were kept and probe_cuda was added to the ai.common.torch import. Please verify the surya import block.

Testing

CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1168

coderabbitai · 2026-06-08T10:21:30Z

📝 Walkthrough

Walkthrough

This PR adds CUDA kernel probing to detect GPU architecture incompatibility at model-load time, allowing automatic fallback to CPU. A new probe_cuda() utility validates CUDA via minimal computation, then integrates across 8 AI model loaders with fallback logic and two loaders featuring enhanced retry handling.

Changes

CUDA kernel probing and GPU robustness

Layer / File(s)	Summary
CUDA probe utility foundation `packages/ai/src/ai/common/torch/__init__.py`	New `probe_cuda(device_index: int = 0) -> bool` function validates CUDA kernel availability by allocating a small tensor and performing GEMM; module exports updated to include both `torch` and `probe_cuda`.
OCR, transformer, and vision loaders with CUDA probing `packages/ai/src/ai/common/models/ocr/doctr.py`, `packages/ai/src/ai/common/models/ocr/surya.py`, `packages/ai/src/ai/common/models/ocr/trocr.py`, `packages/ai/src/ai/common/models/gliner/gliner.py`, `packages/ai/src/ai/common/models/transformers/sentence_transformers.py`, `packages/ai/src/ai/common/models/vision/vision.py`	Six loaders now import `probe_cuda` and apply it during CUDA device selection; each parses GPU index from device string and falls back to CPU with warning when probe fails.
Transformer `_load_model` and `_load_pipeline` with probing `packages/ai/src/ai/common/models/transformers/transformers.py`	Two utility functions add `probe_cuda` imports and apply probing logic in local-mode device selection; on probe failure, device index is set to `-1` (CPU) with warning logged.
Whisper GPU-to-CPU load retry with compute downgrade `packages/ai/src/ai/common/models/audio/whisper.py`	WhisperLoader catches GPU model-load failures, downgrades `float16` compute type to `int8`, retries on CPU, and wraps any CPU failure with original exception context for better error reporting.
EasyOCR probe validation and reader initialization retry `packages/ai/src/ai/common/models/ocr/easyocr.py`	EasyOCRLoader probes GPU availability and disables on failure; `easyocr.Reader` initialization wraps with GPU-to-CPU retry logic, logging GPU errors before retrying on CPU and re-raising only if CPU init also fails.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

rocketride-org/rocketride-server#1052: Both PRs modify packages/ai/src/ai/common/models/audio/whisper.py to harden local-mode GPU handling by running CUDA compatibility checks and falling back to CPU when the requested CUDA device fails.
rocketride-org/rocketride-server#1043: Both PRs adjust Whisper GPU-to-CPU fallback behavior in packages/ai/src/ai/common/models/audio/whisper.py; main adds load-time retry with compute-type downgrade while the retrieved PR tightens GPU compatibility subprocess probing for ctranslate2>=4.7.

Suggested reviewers

jmaionchi
stepmikhaylov
Rod-Christensen
asclearuc
dsapandora

Poem

🐰 A kernel probe hops through GPU lanes,
Checking if CUDA's ready for the chains,
If not, a whisper soft says "CPU's way,"
We EasyOCR through clouds today! 🌤️

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR fully implements the objectives from issue `#1168`: adds probe_cuda() helper, integrates CUDA probing across all local-mode loaders, and implements CPU fallback when incompatibility is detected.
Out of Scope Changes check	✅ Passed	All changes are directly related to detecting CUDA kernel incompatibility and implementing CPU fallback; no unrelated modifications are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: adding CUDA kernel compatibility checking with automatic CPU fallback across multiple AI model loaders.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-08T10:41:07Z

🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

… device Add probe_cuda() to ai.common.torch (tiny GEMM + synchronize) to detect cudaErrorNoKernelImageForDevice at model-load time rather than asynchronously on first inference. Local-mode loaders (transformers, sentence_transformers, vision, gliner, easyocr, trocr, surya, doctr) probe before committing to GPU and fall back to CPU; whisper retries on CPU in its except block. Fixes rocketride-org#1168

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 273-278: The except blocks in the whisper model loader lose the
original traceback by re-raising new Exceptions; update the two raise statements
in the Whisper loading logic (the except Exception as cpu_e branch and the outer
except as e branch) to use exception chaining (raise Exception(f'Failed to load
whisper model {model_name}: {cpu_e}') from cpu_e and raise Exception(f'Failed to
load whisper model {model_name}: {e}') from e respectively), keeping the
existing logger.error calls (logger.error(...)) and message content intact so
the original exceptions cpu_e and e are preserved in the chain.

In `@packages/ai/src/ai/common/models/ocr/easyocr.py`:
- Around line 153-155: The error message when EasyOCR fails to load on CPU
should indicate that a GPU probe was attempted and a fallback to CPU occurred;
update the exception handling around easyocr.Reader creation (the block that
catches exceptions after probe_cuda and sets use_gpu = False) to log and raise a
message that includes the fallback state (e.g., reference use_gpu and that
probe_cuda was invoked) so the logger.error and raised Exception include that
CPU fallback was attempted after a GPU probe failure; locate the probe_cuda
invocation and the easyocr.Reader construction to adjust the log text
accordingly.
- Around line 150-155: The except blocks that currently re-raise new Exception
objects lose the original traceback; update the two re-raises in the EasyOCR
loading logic to use "raise Exception(... ) from <original_exception>" so the
chain is preserved (use "from cpu_e" for the cpu_e handler and "from e" for the
outer handler), leaving the logger.error calls intact and referring to the same
variables (cpu_e and e) so full exception context is retained for debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bf488c65-5eee-47ad-9f05-c6d22cf46b60

📥 Commits

Reviewing files that changed from the base of the PR and between efecb7e and e8435bb.

📒 Files selected for processing (10)

packages/ai/src/ai/common/models/audio/whisper.py
packages/ai/src/ai/common/models/gliner/gliner.py
packages/ai/src/ai/common/models/ocr/doctr.py
packages/ai/src/ai/common/models/ocr/easyocr.py
packages/ai/src/ai/common/models/ocr/surya.py
packages/ai/src/ai/common/models/ocr/trocr.py
packages/ai/src/ai/common/models/transformers/sentence_transformers.py
packages/ai/src/ai/common/models/transformers/transformers.py
packages/ai/src/ai/common/models/vision/vision.py
packages/ai/src/ai/common/torch/__init__.py

coderabbitai · 2026-06-08T11:58:43Z

+                except Exception as cpu_e:
+                    logger.error(f'Failed to load whisper model on CPU: {cpu_e}')
+                    raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}')
+            else:
+                logger.error(f'Failed to load whisper model: {e}')
+                raise Exception(f'Failed to load whisper model {model_name}: {e}')


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Preserve exception chain with raise ... from for better debugging.

The exception handling loses the original traceback by creating a new Exception without chaining. Python best practice is to use raise ... from to preserve the full exception context.

♻️ Proposed fix

except Exception as cpu_e: logger.error(f'Failed to load whisper model on CPU: {cpu_e}') - raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') + raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') from cpu_e else: logger.error(f'Failed to load whisper model: {e}') - raise Exception(f'Failed to load whisper model {model_name}: {e}') + raise Exception(f'Failed to load whisper model {model_name}: {e}') from e

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/src/ai/common/models/audio/whisper.py` around lines 273 - 278, The except blocks in the whisper model loader lose the original traceback by re-raising new Exceptions; update the two raise statements in the Whisper loading logic (the except Exception as cpu_e branch and the outer except as e branch) to use exception chaining (raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') from cpu_e and raise Exception(f'Failed to load whisper model {model_name}: {e}') from e respectively), keeping the existing logger.error calls (logger.error(...)) and message content intact so the original exceptions cpu_e and e are preserved in the chain.

coderabbitai · 2026-06-08T11:58:43Z

+                except Exception as cpu_e:
+                    logger.error(f'Failed to load EasyOCR: {cpu_e}')
+                    raise Exception(f'Failed to load EasyOCR: {cpu_e}')
+            else:
+                logger.error(f'Failed to load EasyOCR: {e}')
+                raise Exception(f'Failed to load EasyOCR: {e}')


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Preserve exception chain with raise ... from for better debugging.

Both exception handlers create new Exception instances without chaining the original exception, losing valuable traceback information. Use raise ... from to preserve the full exception context.

♻️ Proposed fix

except Exception as cpu_e: logger.error(f'Failed to load EasyOCR: {cpu_e}') - raise Exception(f'Failed to load EasyOCR: {cpu_e}') + raise Exception(f'Failed to load EasyOCR: {cpu_e}') from cpu_e else: logger.error(f'Failed to load EasyOCR: {e}') - raise Exception(f'Failed to load EasyOCR: {e}') + raise Exception(f'Failed to load EasyOCR: {e}') from e

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 150 - 155, The except blocks that currently re-raise new Exception objects lose the original traceback; update the two re-raises in the EasyOCR loading logic to use "raise Exception(... ) from <original_exception>" so the chain is preserved (use "from cpu_e" for the cpu_e handler and "from e" for the outer handler), leaving the logger.error calls intact and referring to the same variables (cpu_e and e) so full exception context is retained for debugging.

coderabbitai · 2026-06-08T11:58:43Z

+            else:
+                logger.error(f'Failed to load EasyOCR: {e}')
+                raise Exception(f'Failed to load EasyOCR: {e}')


🧹 Nitpick | 🔵 Trivial | 💤 Low value

Clarify error message when probe-triggered fallback fails.

When probe_cuda fails (line 129) and triggers fallback to CPU, use_gpu is set to False. If the subsequent easyocr.Reader creation on CPU also fails, the error log at line 154 reads "Failed to load EasyOCR: {e}" without indicating that GPU was attempted first. This could confuse debugging—reviewers might think GPU was never tried.

Consider logging a more specific message when CPU loading fails after a probe-triggered fallback, or tracking the fallback state to improve the diagnostic output.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 153 - 155, The error message when EasyOCR fails to load on CPU should indicate that a GPU probe was attempted and a fallback to CPU occurred; update the exception handling around easyocr.Reader creation (the block that catches exceptions after probe_cuda and sets use_gpu = False) to log and raise a message that includes the fallback state (e.g., reference use_gpu and that probe_cuda was invoked) so the logger.error and raised Exception include that CPU fallback was attempted after a GPU probe failure; locate the probe_cuda invocation and the easyocr.Reader construction to adjust the log text accordingly.

ksaurabhAparavi requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners June 8, 2026 10:21

github-actions Bot added the module:ai AI/ML modules label Jun 8, 2026

ksaurabhAparavi force-pushed the fix/RR-1168-cuda-cpu-fallback branch from df5b270 to e8435bb Compare June 8, 2026 11:51

ksaurabhAparavi changed the title ~~fix(ai): fall back to CPU when CUDA kernels are incompatible with device~~ fix(ai): fall back to CPU when CUDA kernels are incompatible with the device Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai): fall back to CPU when CUDA kernels are incompatible with the device#1181

fix(ai): fall back to CPU when CUDA kernels are incompatible with the device#1181
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1168-cuda-cpu-fallback

ksaurabhAparavi commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ksaurabhAparavi commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

⚠️ Reviewer note

Testing

Linked Issue

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ksaurabhAparavi commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading