feat (calculator): Added a hybrid model tab and support V4 by Potterluo · Pull Request #996 · ModelEngine-Group/unified-cache-management

Potterluo · 2026-06-05T01:23:07Z

feat (calculator): Added a hybrid model tab and optimized the KV caching forex interface

Added two tabs, Standard Model and Hybrid Model, for model selection
Implemented dedicated configuration options for DeepSeek V4 series hybrid models

…ing forex interface - Added two tabs, Standard Model and Hybrid Model, for model selection - Implemented dedicated configuration options for DeepSeek V4 series hybrid models

ygwpz · 2026-06-05T06:09:35Z

+    'MQA': {
+        title: 'MQA (Multi-Query Attention)',
+        icon: '🔹',
+        color: '#10b981',


⚠️ Warning: The MQA formula appears incomplete. It shows head_dim but the calculation logic in performCalculation uses hidden_size * (kvHeads / num_attention_heads) for the else branch. The formula description should clarify that kv_heads = 1 means the head_dim is hidden_size / attn_heads, and the formula should be consistent with the actual calculation.

The MQA formula is correct. When kv_heads = 1, head_dim = hidden_size / attn_heads. The formula 2 × layers × tokens × batch × head_dim / tp is mathematically equivalent to 2 × layers × tokens × batch × hidden_size × (1/attn_heads) / tp. The parameter description already clarifies: head_dim: "Head dimension (hidden_size / attn_heads)". No changes needed.

ygwpz · 2026-06-05T06:09:36Z

+    'DSA': {
+        title: 'DSA (DeepSeek Sparse Attention)',
+        icon: '🔮',
+        color: '#9333ea',


⚠️ Warning: The DSA formula is missing batch_size and / tensor_parallelism. The formula shown is KV = layers × tokens × ... but the actual calculation in performCalculation includes batch_size and divides by tp. Please update the formula display to match the actual calculation logic.

✅ Fixed. Updated DSA formula to: Single-GPU KV Cache = num_hidden_layers × num_tokens × batch_size × (kv_lora_rank + qk_rope_head_dim + index_head_dim) / tensor_parallelism × dtype_bytes

ygwpz · 2026-06-05T06:09:38Z

+
+    const modelName = document.getElementById('hybrid-model-select').value;
+    const deployment = document.getElementById('hybrid-deployment').value;
+    const tokens = parseInt(document.getElementById('hybrid-token-input').value) || 4096;


💡 Suggestion: Add validation for the parsed tokens value. If tokens <= 0 or isNaN(tokens), the calculation will produce invalid results. Consider adding input validation similar to calculateKVCache() function:

if (!tokens || isNaN(tokens) || tokens <= 0) { displayError('Invalid Input', 'Please enter a valid positive number for tokens.'); return; }

✅ Fixed. Added input validation for tokens in calculateHybrid() and calculateHybridMaxTokens().

ygwpz · 2026-06-05T06:09:39Z

+    const modelName = document.getElementById('hybrid-model-select').value;
+    const deployment = document.getElementById('hybrid-deployment').value;
+    const tokens = parseInt(document.getElementById('hybrid-token-input').value) || 4096;
+    const batchSize = parseInt(document.getElementById('hybrid-batch-size').value) || 1;


💡 Suggestion: Add validation for batch_size and tp similar to the standard model calculation. Invalid values (e.g., tp = 0) would cause division by zero or incorrect results:

if (tp <= 0) { displayError('Invalid Input', 'Tensor Parallelism must be at least 1.'); return; }

✅ Fixed. Added input validation for batch_size and tp to prevent division by zero or invalid results.

ygwpz · 2026-06-05T06:09:41Z

+        c4aLayers: 30,
+        c128aLayers: 31,
+        // vllm-ascend (512 tokens/block)
+        vllmAscend: {


💡 Suggestion: The hardcoded byte-per-block values (e.g., c4aCompressor: 131072) are magic numbers. Consider adding comments explaining how these values were derived or calculated, to make the code more maintainable and help future readers understand the data source.

✅ Fixed. Added calculation source comments explaining how values like c4aCompressor: 131072 are derived (e.g., 512 tokens × 256 KV heads × 1 byte = 131072 B).

Potterluo

Thank you for the detailed review! Here are my responses:

Comment 1: MQA formula - The MQA formula is correct. When kv_heads = 1, head_dim = hidden_size / attn_heads. The formula 2 × layers × tokens × batch × head_dim / tp is mathematically equivalent to the actual calculation 2 × layers × tokens × batch × hidden_size × (1/attn_heads) / tp. The parameter description already clarifies: head_dim: 'Head dimension (hidden_size / attn_heads)'. No changes needed.

Comment 2: DSA formula missing batch_size/tp ✅ Fixed. Updated formula to: Single-GPU KV Cache = num_hidden_layers × num_tokens × batch_size × (kv_lora_rank + qk_rope_head_dim + index_head_dim) / tensor_parallelism × dtype_bytes

Comment 3: Input validation for tokens ✅ Fixed. Added validation in calculateHybrid() and calculateHybridMaxTokens().

Comment 4: Input validation for batch_size/tp ✅ Fixed. Added complete validation for all input parameters.

Comment 5: Magic numbers documentation ✅ Fixed. Added calculation source comments explaining how values like c4aCompressor: 131072 are derived.

Please review the latest commit for the fixes.

- Fix DSA formula to include batch_size and tensor_parallelism - Add input validation for tokens, batch_size, tp in calculateHybrid() - Add input validation in calculateHybridMaxTokens() - Add calculation source comments for magic numbers (c4aCompressor etc) - Rename archType to modelArch to fix codespell warning

Potterluo requested review from FangRun2, Infinite666, Tarrei, flesher0813, mag1c-h and ygwpz as code owners June 5, 2026 01:23

Potterluo added 2 commits June 5, 2026 10:03

feat (calculator): Added a hybrid model tab and optimized the KV cach…

fd2100b

…ing forex interface - Added two tabs, Standard Model and Hybrid Model, for model selection - Implemented dedicated configuration options for DeepSeek V4 series hybrid models

bugfix: Hybrid Model warning and cleancode

bd86a2c

Potterluo force-pushed the feature_v4_kvcache branch from f617388 to bd86a2c Compare June 5, 2026 02:03

ygwpz reviewed Jun 5, 2026

View reviewed changes

Potterluo commented Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat (calculator): Added a hybrid model tab and support V4#996

feat (calculator): Added a hybrid model tab and support V4#996
Potterluo wants to merge 3 commits into
ModelEngine-Group:developfrom
Potterluo:feature_v4_kvcache

Potterluo commented Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

Potterluo Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

Potterluo Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

Potterluo Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

Potterluo Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

Potterluo Jun 5, 2026

Uh oh!

Potterluo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Potterluo commented Jun 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Potterluo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants