fix(ai): fall back to CPU when CUDA kernels are incompatible with the device#1181
Conversation
📝 WalkthroughWalkthroughThis PR adds CUDA kernel probing to detect GPU architecture incompatibility at model-load time, allowing automatic fallback to CPU. A new ChangesCUDA kernel probing and GPU robustness
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 Internal: Discord sync markerAuto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete. |
… device Add probe_cuda() to ai.common.torch (tiny GEMM + synchronize) to detect cudaErrorNoKernelImageForDevice at model-load time rather than asynchronously on first inference. Local-mode loaders (transformers, sentence_transformers, vision, gliner, easyocr, trocr, surya, doctr) probe before committing to GPU and fall back to CPU; whisper retries on CPU in its except block. Fixes rocketride-org#1168
df5b270 to
e8435bb
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 273-278: The except blocks in the whisper model loader lose the
original traceback by re-raising new Exceptions; update the two raise statements
in the Whisper loading logic (the except Exception as cpu_e branch and the outer
except as e branch) to use exception chaining (raise Exception(f'Failed to load
whisper model {model_name}: {cpu_e}') from cpu_e and raise Exception(f'Failed to
load whisper model {model_name}: {e}') from e respectively), keeping the
existing logger.error calls (logger.error(...)) and message content intact so
the original exceptions cpu_e and e are preserved in the chain.
In `@packages/ai/src/ai/common/models/ocr/easyocr.py`:
- Around line 153-155: The error message when EasyOCR fails to load on CPU
should indicate that a GPU probe was attempted and a fallback to CPU occurred;
update the exception handling around easyocr.Reader creation (the block that
catches exceptions after probe_cuda and sets use_gpu = False) to log and raise a
message that includes the fallback state (e.g., reference use_gpu and that
probe_cuda was invoked) so the logger.error and raised Exception include that
CPU fallback was attempted after a GPU probe failure; locate the probe_cuda
invocation and the easyocr.Reader construction to adjust the log text
accordingly.
- Around line 150-155: The except blocks that currently re-raise new Exception
objects lose the original traceback; update the two re-raises in the EasyOCR
loading logic to use "raise Exception(... ) from <original_exception>" so the
chain is preserved (use "from cpu_e" for the cpu_e handler and "from e" for the
outer handler), leaving the logger.error calls intact and referring to the same
variables (cpu_e and e) so full exception context is retained for debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bf488c65-5eee-47ad-9f05-c6d22cf46b60
📒 Files selected for processing (10)
packages/ai/src/ai/common/models/audio/whisper.pypackages/ai/src/ai/common/models/gliner/gliner.pypackages/ai/src/ai/common/models/ocr/doctr.pypackages/ai/src/ai/common/models/ocr/easyocr.pypackages/ai/src/ai/common/models/ocr/surya.pypackages/ai/src/ai/common/models/ocr/trocr.pypackages/ai/src/ai/common/models/transformers/sentence_transformers.pypackages/ai/src/ai/common/models/transformers/transformers.pypackages/ai/src/ai/common/models/vision/vision.pypackages/ai/src/ai/common/torch/__init__.py
| except Exception as cpu_e: | ||
| logger.error(f'Failed to load whisper model on CPU: {cpu_e}') | ||
| raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') | ||
| else: | ||
| logger.error(f'Failed to load whisper model: {e}') | ||
| raise Exception(f'Failed to load whisper model {model_name}: {e}') |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Preserve exception chain with raise ... from for better debugging.
The exception handling loses the original traceback by creating a new Exception without chaining. Python best practice is to use raise ... from to preserve the full exception context.
♻️ Proposed fix
except Exception as cpu_e:
logger.error(f'Failed to load whisper model on CPU: {cpu_e}')
- raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}')
+ raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') from cpu_e
else:
logger.error(f'Failed to load whisper model: {e}')
- raise Exception(f'Failed to load whisper model {model_name}: {e}')
+ raise Exception(f'Failed to load whisper model {model_name}: {e}') from e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/ai/src/ai/common/models/audio/whisper.py` around lines 273 - 278,
The except blocks in the whisper model loader lose the original traceback by
re-raising new Exceptions; update the two raise statements in the Whisper
loading logic (the except Exception as cpu_e branch and the outer except as e
branch) to use exception chaining (raise Exception(f'Failed to load whisper
model {model_name}: {cpu_e}') from cpu_e and raise Exception(f'Failed to load
whisper model {model_name}: {e}') from e respectively), keeping the existing
logger.error calls (logger.error(...)) and message content intact so the
original exceptions cpu_e and e are preserved in the chain.
| except Exception as cpu_e: | ||
| logger.error(f'Failed to load EasyOCR: {cpu_e}') | ||
| raise Exception(f'Failed to load EasyOCR: {cpu_e}') | ||
| else: | ||
| logger.error(f'Failed to load EasyOCR: {e}') | ||
| raise Exception(f'Failed to load EasyOCR: {e}') |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Preserve exception chain with raise ... from for better debugging.
Both exception handlers create new Exception instances without chaining the original exception, losing valuable traceback information. Use raise ... from to preserve the full exception context.
♻️ Proposed fix
except Exception as cpu_e:
logger.error(f'Failed to load EasyOCR: {cpu_e}')
- raise Exception(f'Failed to load EasyOCR: {cpu_e}')
+ raise Exception(f'Failed to load EasyOCR: {cpu_e}') from cpu_e
else:
logger.error(f'Failed to load EasyOCR: {e}')
- raise Exception(f'Failed to load EasyOCR: {e}')
+ raise Exception(f'Failed to load EasyOCR: {e}') from e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 150 - 155, The
except blocks that currently re-raise new Exception objects lose the original
traceback; update the two re-raises in the EasyOCR loading logic to use "raise
Exception(... ) from <original_exception>" so the chain is preserved (use "from
cpu_e" for the cpu_e handler and "from e" for the outer handler), leaving the
logger.error calls intact and referring to the same variables (cpu_e and e) so
full exception context is retained for debugging.
| else: | ||
| logger.error(f'Failed to load EasyOCR: {e}') | ||
| raise Exception(f'Failed to load EasyOCR: {e}') |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Clarify error message when probe-triggered fallback fails.
When probe_cuda fails (line 129) and triggers fallback to CPU, use_gpu is set to False. If the subsequent easyocr.Reader creation on CPU also fails, the error log at line 154 reads "Failed to load EasyOCR: {e}" without indicating that GPU was attempted first. This could confuse debugging—reviewers might think GPU was never tried.
Consider logging a more specific message when CPU loading fails after a probe-triggered fallback, or tracking the fallback state to improve the diagnostic output.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/ai/src/ai/common/models/ocr/easyocr.py` around lines 153 - 155, The
error message when EasyOCR fails to load on CPU should indicate that a GPU probe
was attempted and a fallback to CPU occurred; update the exception handling
around easyocr.Reader creation (the block that catches exceptions after
probe_cuda and sets use_gpu = False) to log and raise a message that includes
the fallback state (e.g., reference use_gpu and that probe_cuda was invoked) so
the logger.error and raised Exception include that CPU fallback was attempted
after a GPU probe failure; locate the probe_cuda invocation and the
easyocr.Reader construction to adjust the log text accordingly.
Summary
probe_cuda()toai.common.torch(tiny GEMM + synchronize) to detectcudaErrorNoKernelImageForDeviceat model-load time instead of asynchronously on first inference.Conflict-resolved against upstream in
surya.py: upstream's# contract-check: ignoreimport comments were kept andprobe_cudawas added to theai.common.torchimport. Please verify the surya import block.Testing
./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.Linked Issue
Fixes #1168