Skip to content

fix(ai): fall back to CPU when CUDA kernels are incompatible with the device#1181

Open
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1168-cuda-cpu-fallback
Open

fix(ai): fall back to CPU when CUDA kernels are incompatible with the device#1181
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1168-cuda-cpu-fallback

Conversation

@ksaurabhAparavi

@ksaurabhAparavi ksaurabhAparavi commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add probe_cuda() to ai.common.torch (tiny GEMM + synchronize) to detect cudaErrorNoKernelImageForDevice at model-load time instead of asynchronously on first inference.
  • Local-mode loaders (transformers, sentence_transformers, vision, gliner, easyocr, trocr, surya, doctr) probe before committing to GPU and fall back to CPU; whisper retries on CPU in its except block.

⚠️ Reviewer note

Conflict-resolved against upstream in surya.py: upstream's # contract-check: ignore import comments were kept and probe_cuda was added to the ai.common.torch import. Please verify the surya import block.

Testing

  • CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1168

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds CUDA kernel probing to detect GPU architecture incompatibility at model-load time, allowing automatic fallback to CPU. A new probe_cuda() utility validates CUDA via minimal computation, then integrates across 8 AI model loaders with fallback logic and two loaders featuring enhanced retry handling.

Changes

CUDA kernel probing and GPU robustness

Layer / File(s) Summary
CUDA probe utility foundation
packages/ai/src/ai/common/torch/__init__.py
New probe_cuda(device_index: int = 0) -> bool function validates CUDA kernel availability by allocating a small tensor and performing GEMM; module exports updated to include both torch and probe_cuda.
OCR, transformer, and vision loaders with CUDA probing
packages/ai/src/ai/common/models/ocr/doctr.py, packages/ai/src/ai/common/models/ocr/surya.py, packages/ai/src/ai/common/models/ocr/trocr.py, packages/ai/src/ai/common/models/gliner/gliner.py, packages/ai/src/ai/common/models/transformers/sentence_transformers.py, packages/ai/src/ai/common/models/vision/vision.py
Six loaders now import probe_cuda and apply it during CUDA device selection; each parses GPU index from device string and falls back to CPU with warning when probe fails.
Transformer _load_model and _load_pipeline with probing
packages/ai/src/ai/common/models/transformers/transformers.py
Two utility functions add probe_cuda imports and apply probing logic in local-mode device selection; on probe failure, device index is set to -1 (CPU) with warning logged.
Whisper GPU-to-CPU load retry with compute downgrade
packages/ai/src/ai/common/models/audio/whisper.py
WhisperLoader catches GPU model-load failures, downgrades float16 compute type to int8, retries on CPU, and wraps any CPU failure with original exception context for better error reporting.
EasyOCR probe validation and reader initialization retry
packages/ai/src/ai/common/models/ocr/easyocr.py
EasyOCRLoader probes GPU availability and disables on failure; easyocr.Reader initialization wraps with GPU-to-CPU retry logic, logging GPU errors before retrying on CPU and re-raising only if CPU init also fails.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rocketride-org/rocketride-server#1052: Both PRs modify packages/ai/src/ai/common/models/audio/whisper.py to harden local-mode GPU handling by running CUDA compatibility checks and falling back to CPU when the requested CUDA device fails.
  • rocketride-org/rocketride-server#1043: Both PRs adjust Whisper GPU-to-CPU fallback behavior in packages/ai/src/ai/common/models/audio/whisper.py; main adds load-time retry with compute-type downgrade while the retrieved PR tightens GPU compatibility subprocess probing for ctranslate2>=4.7.

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen
  • asclearuc
  • dsapandora

Poem

🐰 A kernel probe hops through GPU lanes,
Checking if CUDA's ready for the chains,
If not, a whisper soft says "CPU's way,"
We EasyOCR through clouds today! 🌤️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR fully implements the objectives from issue #1168: adds probe_cuda() helper, integrates CUDA probing across all local-mode loaders, and implements CPU fallback when incompatibility is detected.
Out of Scope Changes check ✅ Passed All changes are directly related to detecting CUDA kernel incompatibility and implementing CPU fallback; no unrelated modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: adding CUDA kernel compatibility checking with automatic CPU fallback across multiple AI model loaders.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the module:ai AI/ML modules label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

… device

Add probe_cuda() to ai.common.torch (tiny GEMM + synchronize) to detect
cudaErrorNoKernelImageForDevice at model-load time rather than asynchronously
on first inference. Local-mode loaders (transformers, sentence_transformers,
vision, gliner, easyocr, trocr, surya, doctr) probe before committing to GPU
and fall back to CPU; whisper retries on CPU in its except block.

Fixes rocketride-org#1168
@ksaurabhAparavi ksaurabhAparavi force-pushed the fix/RR-1168-cuda-cpu-fallback branch from df5b270 to e8435bb Compare June 8, 2026 11:51
@ksaurabhAparavi ksaurabhAparavi changed the title fix(ai): fall back to CPU when CUDA kernels are incompatible with device fix(ai): fall back to CPU when CUDA kernels are incompatible with the device Jun 8, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 273-278: The except blocks in the whisper model loader lose the
original traceback by re-raising new Exceptions; update the two raise statements
in the Whisper loading logic (the except Exception as cpu_e branch and the outer
except as e branch) to use exception chaining (raise Exception(f'Failed to load
whisper model {model_name}: {cpu_e}') from cpu_e and raise Exception(f'Failed to
load whisper model {model_name}: {e}') from e respectively), keeping the
existing logger.error calls (logger.error(...)) and message content intact so
the original exceptions cpu_e and e are preserved in the chain.

In `@packages/ai/src/ai/common/models/ocr/easyocr.py`:
- Around line 153-155: The error message when EasyOCR fails to load on CPU
should indicate that a GPU probe was attempted and a fallback to CPU occurred;
update the exception handling around easyocr.Reader creation (the block that
catches exceptions after probe_cuda and sets use_gpu = False) to log and raise a
message that includes the fallback state (e.g., reference use_gpu and that
probe_cuda was invoked) so the logger.error and raised Exception include that
CPU fallback was attempted after a GPU probe failure; locate the probe_cuda
invocation and the easyocr.Reader construction to adjust the log text
accordingly.
- Around line 150-155: The except blocks that currently re-raise new Exception
objects lose the original traceback; update the two re-raises in the EasyOCR
loading logic to use "raise Exception(... ) from <original_exception>" so the
chain is preserved (use "from cpu_e" for the cpu_e handler and "from e" for the
outer handler), leaving the logger.error calls intact and referring to the same
variables (cpu_e and e) so full exception context is retained for debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bf488c65-5eee-47ad-9f05-c6d22cf46b60

📥 Commits

Reviewing files that changed from the base of the PR and between efecb7e and e8435bb.

📒 Files selected for processing (10)
  • packages/ai/src/ai/common/models/audio/whisper.py
  • packages/ai/src/ai/common/models/gliner/gliner.py
  • packages/ai/src/ai/common/models/ocr/doctr.py
  • packages/ai/src/ai/common/models/ocr/easyocr.py
  • packages/ai/src/ai/common/models/ocr/surya.py
  • packages/ai/src/ai/common/models/ocr/trocr.py
  • packages/ai/src/ai/common/models/transformers/sentence_transformers.py
  • packages/ai/src/ai/common/models/transformers/transformers.py
  • packages/ai/src/ai/common/models/vision/vision.py
  • packages/ai/src/ai/common/torch/__init__.py

Comment on lines +273 to +278
except Exception as cpu_e:
logger.error(f'Failed to load whisper model on CPU: {cpu_e}')
raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}')
else:
logger.error(f'Failed to load whisper model: {e}')
raise Exception(f'Failed to load whisper model {model_name}: {e}')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Preserve exception chain with raise ... from for better debugging.

The exception handling loses the original traceback by creating a new Exception without chaining. Python best practice is to use raise ... from to preserve the full exception context.

♻️ Proposed fix
                 except Exception as cpu_e:
                     logger.error(f'Failed to load whisper model on CPU: {cpu_e}')
-                    raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}')
+                    raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') from cpu_e
             else:
                 logger.error(f'Failed to load whisper model: {e}')
-                raise Exception(f'Failed to load whisper model {model_name}: {e}')
+                raise Exception(f'Failed to load whisper model {model_name}: {e}') from e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/common/models/audio/whisper.py` around lines 273 - 278,
The except blocks in the whisper model loader lose the original traceback by
re-raising new Exceptions; update the two raise statements in the Whisper
loading logic (the except Exception as cpu_e branch and the outer except as e
branch) to use exception chaining (raise Exception(f'Failed to load whisper
model {model_name}: {cpu_e}') from cpu_e and raise Exception(f'Failed to load
whisper model {model_name}: {e}') from e respectively), keeping the existing
logger.error calls (logger.error(...)) and message content intact so the
original exceptions cpu_e and e are preserved in the chain.

Comment on lines +150 to +155
except Exception as cpu_e:
logger.error(f'Failed to load EasyOCR: {cpu_e}')
raise Exception(f'Failed to load EasyOCR: {cpu_e}')
else:
logger.error(f'Failed to load EasyOCR: {e}')
raise Exception(f'Failed to load EasyOCR: {e}')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Preserve exception chain with raise ... from for better debugging.

Both exception handlers create new Exception instances without chaining the original exception, losing valuable traceback information. Use raise ... from to preserve the full exception context.

♻️ Proposed fix
                 except Exception as cpu_e:
                     logger.error(f'Failed to load EasyOCR: {cpu_e}')
-                    raise Exception(f'Failed to load EasyOCR: {cpu_e}')
+                    raise Exception(f'Failed to load EasyOCR: {cpu_e}') from cpu_e
             else:
                 logger.error(f'Failed to load EasyOCR: {e}')
-                raise Exception(f'Failed to load EasyOCR: {e}')
+                raise Exception(f'Failed to load EasyOCR: {e}') from e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 150 - 155, The
except blocks that currently re-raise new Exception objects lose the original
traceback; update the two re-raises in the EasyOCR loading logic to use "raise
Exception(... ) from <original_exception>" so the chain is preserved (use "from
cpu_e" for the cpu_e handler and "from e" for the outer handler), leaving the
logger.error calls intact and referring to the same variables (cpu_e and e) so
full exception context is retained for debugging.

Comment on lines +153 to +155
else:
logger.error(f'Failed to load EasyOCR: {e}')
raise Exception(f'Failed to load EasyOCR: {e}')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Clarify error message when probe-triggered fallback fails.

When probe_cuda fails (line 129) and triggers fallback to CPU, use_gpu is set to False. If the subsequent easyocr.Reader creation on CPU also fails, the error log at line 154 reads "Failed to load EasyOCR: {e}" without indicating that GPU was attempted first. This could confuse debugging—reviewers might think GPU was never tried.

Consider logging a more specific message when CPU loading fails after a probe-triggered fallback, or tracking the fallback state to improve the diagnostic output.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 153 - 155, The
error message when EasyOCR fails to load on CPU should indicate that a GPU probe
was attempted and a fallback to CPU occurred; update the exception handling
around easyocr.Reader creation (the block that catches exceptions after
probe_cuda and sets use_gpu = False) to log and raise a message that includes
the fallback state (e.g., reference use_gpu and that probe_cuda was invoked) so
the logger.error and raised Exception include that CPU fallback was attempted
after a GPU probe failure; locate the probe_cuda invocation and the
easyocr.Reader construction to adjust the log text accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ai AI/ML modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fall back to CPU when CUDA kernels are incompatible with the device

1 participant