Skip to content

Conversation

@Roxanne0321
Copy link
Collaborator

No description provided.

liruoxuan.lrx added 21 commits November 14, 2025 23:01
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
Signed-off-by: liruoxuan.lrx <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @Roxanne0321, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the SINDI algorithm by integrating value quantization for sparse term data. The primary goal is to enhance memory efficiency and potentially improve performance by storing sparse vector values in lower precision formats (FP16 and SQ8). The changes involve updating core data structures, implementing encoding and decoding logic for quantized values, and modifying the SINDI parameter system to allow configuration of the quantization type. This enables a more resource-efficient representation of sparse data within the index.

Highlights

  • Value Quantization: Introduced value quantization for SparseTermDataCell values, supporting FP16 (half-precision float) and SQ8 (8-bit signed integer) formats. This allows for more compact storage of sparse vector values.
  • Data Type Optimization: Refactored SparseTermDataCell to store term IDs as uint16_t instead of uint32_t and term data as uint8_t arrays (for quantized values) instead of float arrays, significantly reducing memory footprint.
  • Dynamic Quantization Parameter Calculation: The SINDI algorithm now dynamically calculates quantization parameters (min/max values and difference) for SQ8 quantization during the initial data addition phase, ensuring optimal scaling for the stored data.
  • Parameter Configuration: Added a new value_quantization_type parameter to SINDI configuration, allowing users to specify the desired quantization method (fp32, fp16, or sq8).
  • Window Size Adjustment: The valid range for window_size in SINDI parameters has been adjusted to [10000, 60000], and the default DEFAULT_WINDOW_SIZE has been updated from 100000 to 50000.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces value quantization (fp16 and sq8) for SparseTermDatacell, which is a great step for reducing memory footprint. The implementation of quantization and dequantization logic is solid, and the changes are well-tested with new unit tests. I have a few suggestions for improvement. The calculation of quantization parameters for SQ8 is based only on the first batch of data, which might affect accuracy for later data. The memory estimation formula could be more precise to reflect the different sizes of quantized values. Finally, there's a use of a non-standard C++ feature (VLA) that could be easily replaced for better portability.

Comment on lines 63 to 84
if (cur_element_count_ == 0) {
quantization_params_->type = value_quantization_type_;
if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) {
float min_val = std::numeric_limits<float>::max();
float max_val = std::numeric_limits<float>::lowest();
for (int64_t i = 0; i < data_num; ++i) {
const auto& vec = sparse_vectors[i];
for (int j = 0; j < vec.len_; ++j) {
float val = vec.vals_[j];
if (val < min_val)
min_val = val;
if (val > max_val)
max_val = val;
}
}
quantization_params_->min_val = min_val;
quantization_params_->max_val = max_val;
quantization_params_->diff = max_val - min_val;
if (quantization_params_->diff < 1e-6)
quantization_params_->diff = 1.0f;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The quantization parameters for SQ8 (min/max values) are calculated only from the first batch of data added to the index. If subsequent data additions contain values outside of this initial range, they will be clamped, which could lead to significant precision loss and degrade search accuracy.

Consider calculating these statistics on a representative sample of the entire dataset during a training phase, or if the index is built incrementally, you could use a streaming algorithm to update the min/max values as new data is added. If this is an intentional design for performance reasons, it would be beneficial to add a comment explaining this trade-off.


// size of term id + term data
mem += ESTIMATE_DOC_TERM * num_elements * sizeof(float) * 2;
mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The memory estimation in EstimateMemory appears to be inaccurate for quantized data. The current formula doesn't correctly reflect the size of a uint16_t for the term ID and a variable-sized quantized value. The size of the value depends on the value_quantization_type_ (fp32, fp16, or sq8).

To make the estimation more accurate, the formula should account for the quantization type.

Suggested change
mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t));
size_t value_size = sizeof(float);
if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_FP16) {
value_size = sizeof(uint16_t);
} else if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) {
value_size = sizeof(uint8_t);
}
mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(uint16_t) + value_size);

term_ids_[term]->push_back(base_id);
term_datas_[term]->push_back(val);

uint8_t buffer[buffer_length];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of uint8_t buffer[buffer_length]; is a Variable Length Array (VLA), which is a non-standard C++ extension. This can lead to portability issues. It's better to use a fixed-size array, as the maximum possible size is known and small.

        uint8_t buffer[4];

@Roxanne0321
Copy link
Collaborator Author

To reduce SINDI's memory cost, add quantization methods sq8 and fp16 to SparseTermDatacell term_datas_. The memory test comparision results are as follows:
image

@codecov
Copy link

codecov bot commented Dec 22, 2025

Codecov Report

❌ Patch coverage is 89.89899% with 10 lines in your changes missing coverage. Please review.

@@            Coverage Diff             @@
##             main    #1454      +/-   ##
==========================================
+ Coverage   91.51%   91.57%   +0.06%     
==========================================
  Files         326      326              
  Lines       18753    18851      +98     
==========================================
+ Hits        17161    17263     +102     
+ Misses       1592     1588       -4     
Flag Coverage Δ
cpp 91.57% <89.89%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 85.59% <ø> (-0.19%) ⬇️
datacell 92.92% <100.00%> (-0.36%) ⬇️
index 91.10% <84.09%> (+0.02%) ⬆️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5688462...268ae78. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: liruoxuan.lrx <[email protected]>
Copy link
Collaborator

@inabao inabao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wxyucs wxyucs added the kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) label Jan 6, 2026
Copy link
Collaborator

@wxyucs wxyucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@ShawnShawnYou ShawnShawnYou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShawnShawnYou ShawnShawnYou merged commit 6fc473b into antgroup:main Jan 6, 2026
34 of 35 checks passed
@wxyucs
Copy link
Collaborator

wxyucs commented Jan 6, 2026

@Roxanne0321 this pull request cannot cherry-pick to 0.17 (CONFLICT), please create a new pull request to the branch 0.17 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) size/L version/0.18

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants