Conversation
There was a problem hiding this comment.
PR Summary
This PR refactors the model selection and batch handling system to improve multiprocessing capabilities and support for new models like nomic-embed-text-v1.5.
- Replaced direct model instantiation with factory functions in
/libs/infinity_emb/infinity_emb/inference/batch_handler.pyfor better multiprocessing support - Added
CallableReturningBaseTypeHintProtocol in/libs/infinity_emb/infinity_emb/transformer/abstract.pyto improve type safety - Simplified
select_model()in/libs/infinity_emb/infinity_emb/inference/select_model.pyto return callable engine functions instead of tuples with timing info - Added
tiktokenas required dependency in/libs/infinity_emb/pyproject.tomlfor nomic-embed-text-v1.5 support - Modified telemetry in
/libs/infinity_emb/infinity_emb/infinity_server.pyto use empty dicts instead of engine capabilities
9 file(s) reviewed, 8 comment(s)
Edit PR Review Bot Settings | Greptile
| max_inference_t = 4e-3 | ||
|
|
||
| # TODO: Can be parallelized | ||
| for device_map in engine_args._loading_strategy.device_mapping: # type: ignore |
There was a problem hiding this comment.
style: type: ignore on device_mapping access should be replaced with proper type annotation
| assert len(engine_replicas) > 0, "No engine replicas were loaded" | ||
|
|
||
| return engine_replicas, min_inference_t, max_inference_t | ||
| return engine_replicas # type: ignore |
There was a problem hiding this comment.
style: type: ignore on return is unnecessary since return type matches annotation
| return EmbedderEngine.from_inference_engine(engine_args.engine) | ||
|
|
||
|
|
||
| def _get_engine_replica(unloaded_engine, engine_args, device_map) -> "BaseTypeHint": |
There was a problem hiding this comment.
style: function lacks type hints for unloaded_engine and device_map parameters
| th = threading.Thread( | ||
| target=send_telemetry_start, | ||
| args=(engine_args_list, [e.capabilities for e in app.engine_array]), # type: ignore | ||
| args=(engine_args_list, [{} for e in app.engine_array]), # type: ignore |
There was a problem hiding this comment.
logic: Passing empty dictionaries instead of actual engine capabilities will result in loss of telemetry data about model capabilities
| onnxruntime-gpu = {version = "1.19.*", optional=true} | ||
| tensorrt = {version = "^10.6.0", optional=true} | ||
| soundfile = {version="^0.12.1", optional=true} | ||
| tiktoken = "^0.8.0" |
There was a problem hiding this comment.
logic: tiktoken should be marked as optional since it's only needed for specific models. Add optional=true to the dependency.
| model_warmup=False, | ||
| ) | ||
| ) | ||
| [model_func() for model_func in model_funcs] |
There was a problem hiding this comment.
style: Consider catching potential exceptions when calling model functions - initialization could fail for various reasons
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #468 +/- ##
==========================================
+ Coverage 79.51% 79.56% +0.05%
==========================================
Files 41 41
Lines 3417 3441 +24
==========================================
+ Hits 2717 2738 +21
- Misses 700 703 +3 ☔ View full report in Codecov by Sentry. |
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.