Current Behavior
When specifying filters in the config, I expected only the matched parameters to be merged, and all other parameters to remain identical to the base_model.
However, from experiments and direct state_dict comparisons, it appears that:
Parameters matching filters are merged using the specified method (e.g., linear)
But parameters NOT matching filters are still merged (likely using a default behavior), instead of being copied from the base model
This results in full-model blending, even when only partial merging is intended.
Evidence
I compared the merged model with a manually merged version where:
Only selected parameters (e.g., A_logs, x_proj_weight, dt_projs_weight) are averaged
All other parameters are directly copied from the base model
The differences show that many non-filtered parameters (e.g., backbone layers, normalization, head, even control buffers like scan_direction) are modified in MergeKit output.
This leads to significant performance degradation in my case.
Current Behavior
When specifying filters in the config, I expected only the matched parameters to be merged, and all other parameters to remain identical to the base_model.
However, from experiments and direct state_dict comparisons, it appears that:
Parameters matching filters are merged using the specified method (e.g., linear)
But parameters NOT matching filters are still merged (likely using a default behavior), instead of being copied from the base model
This results in full-model blending, even when only partial merging is intended.
Evidence
I compared the merged model with a manually merged version where:
Only selected parameters (e.g., A_logs, x_proj_weight, dt_projs_weight) are averaged
All other parameters are directly copied from the base model
The differences show that many non-filtered parameters (e.g., backbone layers, normalization, head, even control buffers like scan_direction) are modified in MergeKit output.
This leads to significant performance degradation in my case.