refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454

Roxanne0321 · 2025-12-22T06:56:01Z

No description provided.

Signed-off-by: liruoxuan.lrx <[email protected]>

… memory Signed-off-by: liruoxuan.lrx <[email protected]>

…emory-pointer-vector

Signed-off-by: liruoxuan.lrx <[email protected]>

…ocid

Signed-off-by: liruoxuan.lrx <[email protected]>

…alue Signed-off-by: liruoxuan.lrx <[email protected]>

gemini-code-assist · 2025-12-22T06:56:30Z

Summary of Changes

Hello @Roxanne0321, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the SINDI algorithm by integrating value quantization for sparse term data. The primary goal is to enhance memory efficiency and potentially improve performance by storing sparse vector values in lower precision formats (FP16 and SQ8). The changes involve updating core data structures, implementing encoding and decoding logic for quantized values, and modifying the SINDI parameter system to allow configuration of the quantization type. This enables a more resource-efficient representation of sparse data within the index.

Highlights

Value Quantization: Introduced value quantization for SparseTermDataCell values, supporting FP16 (half-precision float) and SQ8 (8-bit signed integer) formats. This allows for more compact storage of sparse vector values.
Data Type Optimization: Refactored SparseTermDataCell to store term IDs as uint16_t instead of uint32_t and term data as uint8_t arrays (for quantized values) instead of float arrays, significantly reducing memory footprint.
Dynamic Quantization Parameter Calculation: The SINDI algorithm now dynamically calculates quantization parameters (min/max values and difference) for SQ8 quantization during the initial data addition phase, ensuring optimal scaling for the stored data.
Parameter Configuration: Added a new value_quantization_type parameter to SINDI configuration, allowing users to specify the desired quantization method (fp32, fp16, or sq8).
Window Size Adjustment: The valid range for window_size in SINDI parameters has been adjusted to [10000, 60000], and the default DEFAULT_WINDOW_SIZE has been updated from 100000 to 50000.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces value quantization (fp16 and sq8) for SparseTermDatacell, which is a great step for reducing memory footprint. The implementation of quantization and dequantization logic is solid, and the changes are well-tested with new unit tests. I have a few suggestions for improvement. The calculation of quantization parameters for SQ8 is based only on the first batch of data, which might affect accuracy for later data. The memory estimation formula could be more precise to reflect the different sizes of quantized values. Finally, there's a use of a non-standard C++ feature (VLA) that could be easily replaced for better portability.

gemini-code-assist · 2025-12-22T06:58:45Z

src/algorithm/sindi/sindi.cpp

+    if (cur_element_count_ == 0) {
+        quantization_params_->type = value_quantization_type_;
+        if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) {
+            float min_val = std::numeric_limits<float>::max();
+            float max_val = std::numeric_limits<float>::lowest();
+            for (int64_t i = 0; i < data_num; ++i) {
+                const auto& vec = sparse_vectors[i];
+                for (int j = 0; j < vec.len_; ++j) {
+                    float val = vec.vals_[j];
+                    if (val < min_val)
+                        min_val = val;
+                    if (val > max_val)
+                        max_val = val;
+                }
+            }
+            quantization_params_->min_val = min_val;
+            quantization_params_->max_val = max_val;
+            quantization_params_->diff = max_val - min_val;
+            if (quantization_params_->diff < 1e-6)
+                quantization_params_->diff = 1.0f;
+        }
+    }


The quantization parameters for SQ8 (min/max values) are calculated only from the first batch of data added to the index. If subsequent data additions contain values outside of this initial range, they will be clamped, which could lead to significant precision loss and degrade search accuracy.

Consider calculating these statistics on a representative sample of the entire dataset during a training phase, or if the index is built incrementally, you could use a streaming algorithm to update the min/max values as new data is added. If this is an intentional design for performance reasons, it would be beneficial to add a comment explaining this trade-off.

gemini-code-assist · 2025-12-22T06:58:45Z

src/algorithm/sindi/sindi.cpp


    // size of term id + term data
-    mem += ESTIMATE_DOC_TERM * num_elements * sizeof(float) * 2;
+    mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t));


The memory estimation in EstimateMemory appears to be inaccurate for quantized data. The current formula doesn't correctly reflect the size of a uint16_t for the term ID and a variable-sized quantized value. The size of the value depends on the value_quantization_type_ (fp32, fp16, or sq8).

To make the estimation more accurate, the formula should account for the quantization type.

Suggested change

mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t));

size_t value_size = sizeof(float);

if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_FP16) {

value_size = sizeof(uint16_t);

} else if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) {

value_size = sizeof(uint8_t);

}

mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(uint16_t) + value_size);

gemini-code-assist · 2025-12-22T06:58:45Z

src/datacell/sparse_term_datacell.cpp

        term_ids_[term]->push_back(base_id);
-        term_datas_[term]->push_back(val);
+
+        uint8_t buffer[buffer_length];


The use of uint8_t buffer[buffer_length]; is a Variable Length Array (VLA), which is a non-standard C++ extension. This can lead to portability issues. It's better to use a fixed-size array, as the maximum possible size is known and small.

uint8_t buffer[4];

Roxanne0321 · 2025-12-22T07:00:41Z

To reduce SINDI's memory cost, add quantization methods sq8 and fp16 to SparseTermDatacell term_datas_. The memory test comparision results are as follows:

…ormat check Signed-off-by: liruoxuan.lrx <[email protected]>

…ix test Signed-off-by: liruoxuan.lrx <[email protected]>

codecov · 2025-12-22T10:26:17Z

Codecov Report

❌ Patch coverage is 89.89899% with 10 lines in your changes missing coverage. Please review.

@@            Coverage Diff             @@
##             main    #1454      +/-   ##
==========================================
+ Coverage   91.51%   91.57%   +0.06%     
==========================================
  Files         326      326              
  Lines       18753    18851      +98     
==========================================
+ Hits        17161    17263     +102     
+ Misses       1592     1588       -4

Flag	Coverage Δ
cpp	`91.57% <89.89%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`85.59% <ø> (-0.19%)`	⬇️
datacell	`92.92% <100.00%> (-0.36%)`	⬇️
index	`91.10% <84.09%> (+0.02%)`	⬆️
simd	`100.00% <ø> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5688462...268ae78. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: liruoxuan.lrx <[email protected]>

…alue Signed-off-by: liruoxuan.lrx <[email protected]>

Signed-off-by: liruoxuan.lrx <[email protected]>

…alue

Signed-off-by: liruoxuan.lrx <[email protected]>

inabao

LGTM

wxyucs

lgtm

ShawnShawnYou

LGTM

wxyucs · 2026-01-06T06:27:20Z

@Roxanne0321 this pull request cannot cherry-pick to 0.17 (CONFLICT), please create a new pull request to the branch 0.17 .

liruoxuan.lrx added 21 commits November 14, 2025 23:01

feat(sparse): revise doc prune function

e6d7f44

Signed-off-by: liruoxuan.lrx <[email protected]>

feat(sparse): revise doc prune function

1b6ead9

Signed-off-by: liruoxuan.lrx <[email protected]>

feat(sparse): revise doc prune function

7068a2e

Signed-off-by: liruoxuan.lrx <[email protected]>

feat(sparse): revise doc prune function

7598970

Signed-off-by: liruoxuan.lrx <[email protected]>

feat(sparse): revise doc prune function

5151802

Signed-off-by: liruoxuan.lrx <[email protected]>

feat(sparse): revise doc prune function

5afd10e

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge remote-tracking branch 'upstream/main' into main

7b6440c

Signed-off-by: liruoxuan.lrx <[email protected]>

chore: Re-trigger CI checks

9ff27e7

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge remote-tracking branch 'upstream/main' into main

a30322a

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge branch 'main' of github.com:antgroup/vsag into main

755619b

[skip ci]add sindi performance

e722129

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge branch 'main' of github.com:antgroup/vsag into main

7967278

Merge branch 'main' of github.com:antgroup/vsag into main

344fb13

refactor(sindi): revise sparse_term_datacell data structure to reduce…

e3b3978

… memory Signed-off-by: liruoxuan.lrx <[email protected]>

refactor(sindi): revise sparse_term_datacell data structure to reduce…

be6ff1d

… memory Signed-off-by: liruoxuan.lrx <[email protected]>

Merge branch 'main' of github.com:antgroup/vsag into optimize-sindi-m…

4eda75d

…emory-pointer-vector

refactor(sindi):transform term_id from uint32_t to uint16_t

aa2e860

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge branch 'main' of github.com:antgroup/vsag into quantize-sindi-d…

98bb7d0

…ocid

fix sindi param

be3c77a

Signed-off-by: liruoxuan.lrx <[email protected]>

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8

9c0b3f6

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge branch 'main' of github.com:antgroup/vsag into quantize-sindi-v…

018203f

…alue Signed-off-by: liruoxuan.lrx <[email protected]>

Roxanne0321 requested review from LHT129, ShawnShawnYou, inabao, jiaweizone and wxyucs as code owners December 22, 2025 06:56

pull-request-size bot added the size/L label Dec 22, 2025

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

liruoxuan.lrx added 3 commits December 22, 2025 15:02

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8, f…

4739e1c

…ormat check Signed-off-by: liruoxuan.lrx <[email protected]>

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8, f…

7cb28fe

…ormat check Signed-off-by: liruoxuan.lrx <[email protected]>

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8, f…

2568a39

…ix test Signed-off-by: liruoxuan.lrx <[email protected]>

wxyucs added the version/0.18 label Dec 22, 2025

fix lint error

19e53e6

Signed-off-by: liruoxuan.lrx <[email protected]>

ShawnShawnYou assigned Roxanne0321 Dec 23, 2025

liruoxuan.lrx added 3 commits December 26, 2025 17:03

refactor(sindi): only keep sq8 quantization to term_datas_

1a63060

Signed-off-by: liruoxuan.lrx <[email protected]>

delete redundant param definition

463179b

Signed-off-by: liruoxuan.lrx <[email protected]>

refactor(sindi): keep fp32 and sq8 quantization

cd96fec

Signed-off-by: liruoxuan.lrx <[email protected]>

inabao added needs-cherry-pick-release-0.16 needs-cherry-pick-release-0.17 and removed needs-cherry-pick-release-0.16 labels Jan 4, 2026

liruoxuan.lrx added 6 commits January 4, 2026 14:43

Merge branch 'main' of github.com:antgroup/vsag into quantize-sindi-v…

95a497e

…alue Signed-off-by: liruoxuan.lrx <[email protected]>

revise format check

bb07428

Signed-off-by: liruoxuan.lrx <[email protected]>

fix tests coverage

c5e91be

Signed-off-by: liruoxuan.lrx <[email protected]>

format check fix

85efb51

Signed-off-by: liruoxuan.lrx <[email protected]>

Merge branch 'main' of github.com:antgroup/vsag into quantize-sindi-v…

a992a14

…alue

compatible serialize and deserialize with not using quantization

268ae78

Signed-off-by: liruoxuan.lrx <[email protected]>

inabao approved these changes Jan 5, 2026

View reviewed changes

wxyucs added the kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) label Jan 6, 2026

wxyucs approved these changes Jan 6, 2026

View reviewed changes

ShawnShawnYou approved these changes Jan 6, 2026

View reviewed changes

ShawnShawnYou merged commit 6fc473b into antgroup:main Jan 6, 2026
34 of 35 checks passed

wxyucs removed the needs-cherry-pick-release-0.17 label Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454

Uh oh!

Roxanne0321 commented Dec 22, 2025

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

Roxanne0321 commented Dec 22, 2025

Uh oh!

codecov bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

inabao left a comment

Uh oh!

wxyucs left a comment

Uh oh!

ShawnShawnYou left a comment

Uh oh!

Uh oh!

wxyucs commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-    mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(float) + sizeof(uint16_t));
+    size_t value_size = sizeof(float);
+    if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_FP16) {
+        value_size = sizeof(uint16_t);
+    } else if (value_quantization_type_ == QUANTIZATION_TYPE_VALUE_SQ8) {
+        value_size = sizeof(uint8_t);
+    }
+    mem += ESTIMATE_DOC_TERM * num_elements * (sizeof(uint16_t) + value_size);

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454

refactor(sindi): quantize SparseTermDatacell value to fp16 and sq8 #1454

Uh oh!

Conversation

Roxanne0321 commented Dec 22, 2025

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Roxanne0321 commented Dec 22, 2025

Uh oh!

codecov bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

inabao left a comment

Choose a reason for hiding this comment

Uh oh!

wxyucs left a comment

Choose a reason for hiding this comment

Uh oh!

ShawnShawnYou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wxyucs commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Dec 22, 2025 •

edited

Loading