-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix(ai): fall back to CPU when CUDA kernels are incompatible with the device #1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -97,7 +97,7 @@ def load( | |
| from ai.common.opencv import cv2 # noqa: F401 | ||
|
|
||
| import easyocr | ||
| from ai.common.torch import torch | ||
| from ai.common.torch import torch, probe_cuda | ||
|
|
||
| languages = languages or ['en'] | ||
| exclude_gpus = exclude_gpus or [] | ||
|
|
@@ -126,6 +126,12 @@ def load( | |
| torch_device = 'cuda:0' | ||
| use_gpu = True | ||
|
|
||
| if use_gpu and not probe_cuda(gpu_index): | ||
| logger.warning(f'CUDA device {gpu_index} kernel probe failed, falling back to CPU for EasyOCR') | ||
| use_gpu = False | ||
| gpu_index = -1 | ||
| torch_device = 'cpu' | ||
|
|
||
| logger.info(f'Loading EasyOCR with languages {languages} on {torch_device}') | ||
|
|
||
| try: | ||
|
|
@@ -135,8 +141,18 @@ def load( | |
| verbose=False, | ||
| ) | ||
| except Exception as e: | ||
| logger.error(f'Failed to load EasyOCR: {e}') | ||
| raise Exception(f'Failed to load EasyOCR: {e}') | ||
| if use_gpu: | ||
| logger.warning(f'EasyOCR GPU load failed ({e}), falling back to CPU') | ||
| gpu_index = -1 | ||
| torch_device = 'cpu' | ||
| try: | ||
| reader = easyocr.Reader(languages, gpu=False, verbose=False) | ||
| except Exception as cpu_e: | ||
| logger.error(f'Failed to load EasyOCR: {cpu_e}') | ||
| raise Exception(f'Failed to load EasyOCR: {cpu_e}') | ||
| else: | ||
| logger.error(f'Failed to load EasyOCR: {e}') | ||
| raise Exception(f'Failed to load EasyOCR: {e}') | ||
|
Comment on lines
+150
to
+155
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win Preserve exception chain with Both exception handlers create new ♻️ Proposed fix except Exception as cpu_e:
logger.error(f'Failed to load EasyOCR: {cpu_e}')
- raise Exception(f'Failed to load EasyOCR: {cpu_e}')
+ raise Exception(f'Failed to load EasyOCR: {cpu_e}') from cpu_e
else:
logger.error(f'Failed to load EasyOCR: {e}')
- raise Exception(f'Failed to load EasyOCR: {e}')
+ raise Exception(f'Failed to load EasyOCR: {e}') from e🤖 Prompt for AI Agents
Comment on lines
+153
to
+155
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick | 🔵 Trivial | 💤 Low value Clarify error message when probe-triggered fallback fails. When Consider logging a more specific message when CPU loading fails after a probe-triggered fallback, or tracking the fallback state to improve the diagnostic output. 🤖 Prompt for AI Agents |
||
|
|
||
| # EasyOCR wraps its detector and recognizer in DataParallel, which | ||
| # scatters every batch across ALL visible GPUs via parallel_apply(). | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Preserve exception chain with
raise ... fromfor better debugging.The exception handling loses the original traceback by creating a new
Exceptionwithout chaining. Python best practice is to useraise ... fromto preserve the full exception context.♻️ Proposed fix
except Exception as cpu_e: logger.error(f'Failed to load whisper model on CPU: {cpu_e}') - raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') + raise Exception(f'Failed to load whisper model {model_name}: {cpu_e}') from cpu_e else: logger.error(f'Failed to load whisper model: {e}') - raise Exception(f'Failed to load whisper model {model_name}: {e}') + raise Exception(f'Failed to load whisper model {model_name}: {e}') from e🤖 Prompt for AI Agents