Conversation
There was a problem hiding this comment.
PR Summary
This PR introduces support for 'autoquant', a new automatic quantization feature in the Infinity project. The changes span multiple files and include implementation, documentation, and testing updates.
- Added 'autoquant' as a new option in the Dtype enum and CLI documentation, enabling automatic quantization for improved model performance
- Implemented 'autoquant' support in the SentenceTransformerPatched class and quantization interface
- Added 'torchao' dependency to pyproject.toml, likely to support the new autoquant functionality
- Created a new test function to verify the autoquant feature's effectiveness and accuracy
- Updated README with information on new multi-modal support (CLIP, CLAP) and text classification capabilities
9 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings
|
|
||
| import numpy as np | ||
| import requests # type: ignore | ||
| import torch.ao.quantization |
There was a problem hiding this comment.
style: This import is unused in the current file. Consider removing it if not needed.
| model = torch.quantization.quantize_dynamic( | ||
| model.to("cpu"), # the original model | ||
| {torch.nn.Linear}, # a set of layers to dynamically quantize | ||
| dtype=torch.qint8, | ||
| ) | ||
| model = torch.ao.quantization.quantize_dynamic( | ||
| model, {torch.nn.Linear}, dtype=torch.qint8 | ||
| ) |
There was a problem hiding this comment.
logic: Two quantization methods are applied sequentially. This might lead to unexpected behavior or reduced model performance. Consider using only one method or clarify why both are necessary.
| bettertransformer=False, | ||
| ) | ||
| ) | ||
| sentence = "This is a test sentence." |
There was a problem hiding this comment.
style: This line is unused and can be removed.
| if __name__ == "__main__": | ||
| test_autoquant_quantization() |
There was a problem hiding this comment.
style: Running a single test function in main might not be ideal. Consider using a test runner or removing this block if not necessary.
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files@@ Coverage Diff @@
## main #402 +/- ##
==========================================
- Coverage 79.01% 73.24% -5.77%
==========================================
Files 40 40
Lines 3173 3184 +11
==========================================
- Hits 2507 2332 -175
- Misses 666 852 +186 ☔ View full report in Codecov by Sentry. |
This pull request introduces several changes to the
infinity_emblibrary, focusing on adding support for a newautoquantdata type, updating documentation, and improving the quantization process. The most important changes include adding theautoquantdata type, updating the CLI documentation, modifying quantization logic, and adding unit tests forautoquantquantization.New Features:
autoquantdata type toDtypeenum inlibs/infinity_emb/infinity_emb/primitives.py.autoquantinlibs/infinity_emb/infinity_emb/transformer/quantization/interface.pyandlibs/infinity_emb/infinity_emb/transformer/quantization/quant.py[1] [2].Documentation Updates:
autoquantindocs/docs/cli_v2.md.Codebase Improvements:
Makefileto usepoetry runfor generating OpenAPI and CLI v2 documentation inlibs/infinity_emb/Makefile[1] [2].Dependency Updates:
torchaoas an optional dependency inlibs/infinity_emb/pyproject.toml[1] [2].Testing Enhancements:
autoquantquantization inlibs/infinity_emb/tests/unit_test/transformer/quantization/test_interface.py.