-
Notifications
You must be signed in to change notification settings - Fork 79
refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454
Conversation
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
… memory Signed-off-by: liruoxuan.lrx <[email protected]>
… memory Signed-off-by: liruoxuan.lrx <[email protected]>
…emory-pointer-vector
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
…alue Signed-off-by: liruoxuan.lrx <[email protected]>
Summary of ChangesHello @Roxanne0321, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the SINDI algorithm by integrating value quantization for sparse term data. The primary goal is to enhance memory efficiency and potentially improve performance by storing sparse vector values in lower precision formats (FP16 and SQ8). The changes involve updating core data structures, implementing encoding and decoding logic for quantized values, and modifying the SINDI parameter system to allow configuration of the quantization type. This enables a more resource-efficient representation of sparse data within the index. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces value quantization (fp16 and sq8) for SparseTermDatacell, which is a great step for reducing memory footprint. The implementation of quantization and dequantization logic is solid, and the changes are well-tested with new unit tests. I have a few suggestions for improvement. The calculation of quantization parameters for SQ8 is based only on the first batch of data, which might affect accuracy for later data. The memory estimation formula could be more precise to reflect the different sizes of quantized values. Finally, there's a use of a non-standard C++ feature (VLA) that could be easily replaced for better portability.
src/algorithm/sindi/sindi.cpp
Outdated
| if (cur_element_count_ == 0) { | ||
| quantization_params_->type = value_quantization_type_; | ||
| if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) { | ||
| float min_val = std::numeric_limits<float>::max(); | ||
| float max_val = std::numeric_limits<float>::lowest(); | ||
| for (int64_t i = 0; i < data_num; ++i) { | ||
| const auto& vec = sparse_vectors[i]; | ||
| for (int j = 0; j < vec.len_; ++j) { | ||
| float val = vec.vals_[j]; | ||
| if (val < min_val) | ||
| min_val = val; | ||
| if (val > max_val) | ||
| max_val = val; | ||
| } | ||
| } | ||
| quantization_params_->min_val = min_val; | ||
| quantization_params_->max_val = max_val; | ||
| quantization_params_->diff = max_val - min_val; | ||
| if (quantization_params_->diff < 1e-6) | ||
| quantization_params_->diff = 1.0f; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quantization parameters for SQ8 (min/max values) are calculated only from the first batch of data added to the index. If subsequent data additions contain values outside of this initial range, they will be clamped, which could lead to significant precision loss and degrade search accuracy.
Consider calculating these statistics on a representative sample of the entire dataset during a training phase, or if the index is built incrementally, you could use a streaming algorithm to update the min/max values as new data is added. If this is an intentional design for performance reasons, it would be beneficial to add a comment explaining this trade-off.
|
|
||
| // size of term id + term data | ||
| mem += ESTIMATE_DOC_TERM * num_elements * sizeof(float) * 2; | ||
| mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory estimation in EstimateMemory appears to be inaccurate for quantized data. The current formula doesn't correctly reflect the size of a uint16_t for the term ID and a variable-sized quantized value. The size of the value depends on the value_quantization_type_ (fp32, fp16, or sq8).
To make the estimation more accurate, the formula should account for the quantization type.
| mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t)); | |
| size_t value_size = sizeof(float); | |
| if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_FP16) { | |
| value_size = sizeof(uint16_t); | |
| } else if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) { | |
| value_size = sizeof(uint8_t); | |
| } | |
| mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(uint16_t) + value_size); |
| term_ids_[term]->push_back(base_id); | ||
| term_datas_[term]->push_back(val); | ||
|
|
||
| uint8_t buffer[buffer_length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ormat check Signed-off-by: liruoxuan.lrx <[email protected]>
…ormat check Signed-off-by: liruoxuan.lrx <[email protected]>
…ix test Signed-off-by: liruoxuan.lrx <[email protected]>
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #1454 +/- ##
==========================================
+ Coverage 91.51% 91.57% +0.06%
==========================================
Files 326 326
Lines 18753 18851 +98
==========================================
+ Hits 17161 17263 +102
+ Misses 1592 1588 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
…alue Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
inabao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
wxyucs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ShawnShawnYou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@Roxanne0321 this pull request cannot cherry-pick to 0.17 (CONFLICT), please create a new pull request to the branch 0.17 . |

No description provided.