Conversation
There was a problem hiding this comment.
PR Summary
This PR adds OpenVINO backend support to optimize model inference on Intel hardware, with changes spanning Docker configuration, embedder implementation, and utility functions.
- Added OpenVINO execution provider in
/libs/infinity_emb/infinity_emb/transformer/utils_optimum.pywith bf16 precision support - Implemented OpenVINO model file handling with new
get_openvino_files()function - Set default
INFINITY_ENGINE="optimum"in CPU Docker configuration with OpenVINO extras - Added
CHECK_OPTIMUM_INTELoptional import for Intel optimization capabilities - Duplicate optimization config code in
optimize_model()needs to be cleaned up
5 file(s) reviewed, 9 comment(s)
Edit PR Review Bot Settings | Greptile
| extra_env_variables: | | ||
| # Sets default to onnx | ||
| ENV INFINITY_ENGINE="optimum" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
style: Installing optimum[openvino] in extra_env_variables section is unconventional. Should be moved to main_install or extra_installs_main.
| RUN ./requirements_install_from_poetry.sh --no-root --without lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
logic: Redundant installation of optimum[openvino] - this package is already included via EXTRAS='all openvino' in the environment variables
| RUN ./requirements_install_from_poetry.sh --without lint,test "https://download.pytorch.org/whl/cpu" | ||
| RUN poetry run python -m pip install --upgrade --upgrade-strategy eager "optimum[openvino]" |
There was a problem hiding this comment.
style: Installing optimum[openvino] multiple times in different build stages may cause version conflicts or increase build time unnecessarily
| except Exception as e: # show error then let the optimum intel compress on the fly | ||
| print(str(e)) |
There was a problem hiding this comment.
logic: Silently printing errors and continuing is dangerous. Consider logging the error and/or raising a more specific exception if OpenVINO file loading fails.
| ) | ||
| if provider == "OpenVINOExecutionProvider": | ||
| CHECK_OPTIMUM_INTEL.mark_required() | ||
| filename = "" |
There was a problem hiding this comment.
style: Empty filename could cause issues if exception occurs. Initialize with None instead to make the failure case more explicit.
| else: # Optimum onnx cpu path | ||
| optimizer = ORTOptimizer.from_pretrained(unoptimized_model) | ||
|
|
||
| is_gpu = "cpu" not in execution_provider.lower() |
There was a problem hiding this comment.
logic: Duplicate optimization config block. Remove lines 231-239 as they are identical to 222-230.
| openvino_files = [p for p in repo_files if p.match(pattern)] | ||
|
|
||
| if len(openvino_files) > 1: | ||
| logger.info(f"Found {len(openvino_files)} onnx files: {openvino_files}") |
There was a problem hiding this comment.
syntax: Log message incorrectly refers to 'onnx files' when listing OpenVINO files
| logger.info(f"Found {len(openvino_files)} onnx files: {openvino_files}") | |
| logger.info(f"Found {len(openvino_files)} OpenVINO files: {openvino_files}") |
| if files_optimized: | ||
| file_optimized = files_optimized[-1] | ||
| if file_name: | ||
| file_optimized = file_name |
There was a problem hiding this comment.
logic: Overwriting file_optimized with file_name could bypass optimization caching if file_name is set
| ov_config={ | ||
| "INFERENCE_PRECISION_HINT": "bf16" | ||
| }, # fp16 for now as it has better precision than bf16 |
There was a problem hiding this comment.
style: Using bf16 precision by default may reduce accuracy compared to fp16/fp32 on some hardware
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #467 +/- ##
==========================================
- Coverage 79.51% 78.74% -0.77%
==========================================
Files 41 41
Lines 3417 3468 +51
==========================================
+ Hits 2717 2731 +14
- Misses 700 737 +37 ☔ View full report in Codecov by Sentry. |
@tjtanaa FYI, continued by merging your branch into this and main.