feat (calculator): Added a hybrid model tab and support V4#996
feat (calculator): Added a hybrid model tab and support V4#996Potterluo wants to merge 3 commits into
Conversation
…ing forex interface - Added two tabs, Standard Model and Hybrid Model, for model selection - Implemented dedicated configuration options for DeepSeek V4 series hybrid models
f617388 to
bd86a2c
Compare
| 'MQA': { | ||
| title: 'MQA (Multi-Query Attention)', | ||
| icon: '🔹', | ||
| color: '#10b981', |
There was a problem hiding this comment.
head_dim but the calculation logic in performCalculation uses hidden_size * (kvHeads / num_attention_heads) for the else branch. The formula description should clarify that kv_heads = 1 means the head_dim is hidden_size / attn_heads, and the formula should be consistent with the actual calculation.
There was a problem hiding this comment.
The MQA formula is correct. When kv_heads = 1, head_dim = hidden_size / attn_heads. The formula 2 × layers × tokens × batch × head_dim / tp is mathematically equivalent to 2 × layers × tokens × batch × hidden_size × (1/attn_heads) / tp. The parameter description already clarifies: head_dim: "Head dimension (hidden_size / attn_heads)". No changes needed.
| 'DSA': { | ||
| title: 'DSA (DeepSeek Sparse Attention)', | ||
| icon: '🔮', | ||
| color: '#9333ea', |
There was a problem hiding this comment.
batch_size and / tensor_parallelism. The formula shown is KV = layers × tokens × ... but the actual calculation in performCalculation includes batch_size and divides by tp. Please update the formula display to match the actual calculation logic.
There was a problem hiding this comment.
✅ Fixed. Updated DSA formula to: Single-GPU KV Cache = num_hidden_layers × num_tokens × batch_size × (kv_lora_rank + qk_rope_head_dim + index_head_dim) / tensor_parallelism × dtype_bytes
|
|
||
| const modelName = document.getElementById('hybrid-model-select').value; | ||
| const deployment = document.getElementById('hybrid-deployment').value; | ||
| const tokens = parseInt(document.getElementById('hybrid-token-input').value) || 4096; |
There was a problem hiding this comment.
💡 Suggestion: Add validation for the parsed tokens value. If tokens <= 0 or isNaN(tokens), the calculation will produce invalid results. Consider adding input validation similar to calculateKVCache() function:
if (!tokens || isNaN(tokens) || tokens <= 0) {
displayError('Invalid Input', 'Please enter a valid positive number for tokens.');
return;
}There was a problem hiding this comment.
✅ Fixed. Added input validation for tokens in calculateHybrid() and calculateHybridMaxTokens().
| const modelName = document.getElementById('hybrid-model-select').value; | ||
| const deployment = document.getElementById('hybrid-deployment').value; | ||
| const tokens = parseInt(document.getElementById('hybrid-token-input').value) || 4096; | ||
| const batchSize = parseInt(document.getElementById('hybrid-batch-size').value) || 1; |
There was a problem hiding this comment.
💡 Suggestion: Add validation for batch_size and tp similar to the standard model calculation. Invalid values (e.g., tp = 0) would cause division by zero or incorrect results:
if (tp <= 0) {
displayError('Invalid Input', 'Tensor Parallelism must be at least 1.');
return;
}There was a problem hiding this comment.
✅ Fixed. Added input validation for batch_size and tp to prevent division by zero or invalid results.
| c4aLayers: 30, | ||
| c128aLayers: 31, | ||
| // vllm-ascend (512 tokens/block) | ||
| vllmAscend: { |
There was a problem hiding this comment.
💡 Suggestion: The hardcoded byte-per-block values (e.g., c4aCompressor: 131072) are magic numbers. Consider adding comments explaining how these values were derived or calculated, to make the code more maintainable and help future readers understand the data source.
There was a problem hiding this comment.
✅ Fixed. Added calculation source comments explaining how values like c4aCompressor: 131072 are derived (e.g., 512 tokens × 256 KV heads × 1 byte = 131072 B).
Potterluo
left a comment
There was a problem hiding this comment.
Thank you for the detailed review! Here are my responses:
Comment 1: MQA formula - The MQA formula is correct. When kv_heads = 1, head_dim = hidden_size / attn_heads. The formula 2 × layers × tokens × batch × head_dim / tp is mathematically equivalent to the actual calculation 2 × layers × tokens × batch × hidden_size × (1/attn_heads) / tp. The parameter description already clarifies: head_dim: 'Head dimension (hidden_size / attn_heads)'. No changes needed.
Comment 2: DSA formula missing batch_size/tp ✅ Fixed. Updated formula to: Single-GPU KV Cache = num_hidden_layers × num_tokens × batch_size × (kv_lora_rank + qk_rope_head_dim + index_head_dim) / tensor_parallelism × dtype_bytes
Comment 3: Input validation for tokens ✅ Fixed. Added validation in calculateHybrid() and calculateHybridMaxTokens().
Comment 4: Input validation for batch_size/tp ✅ Fixed. Added complete validation for all input parameters.
Comment 5: Magic numbers documentation ✅ Fixed. Added calculation source comments explaining how values like c4aCompressor: 131072 are derived.
Please review the latest commit for the fixes.
- Fix DSA formula to include batch_size and tensor_parallelism - Add input validation for tokens, batch_size, tp in calculateHybrid() - Add input validation in calculateHybridMaxTokens() - Add calculation source comments for magic numbers (c4aCompressor etc) - Rename archType to modelArch to fix codespell warning
feat (calculator): Added a hybrid model tab and optimized the KV caching forex interface