Add Qwen3.5 architecture support by ZhangYiqun018 · Pull Request #686 · arcee-ai/mergekit

ZhangYiqun018 · 2026-05-07T06:40:06Z

Summary

add architecture support for Qwen3.5 dense and MoE models
cover Qwen3.5 multimodal wrapper weights, text decoder, vision tower, mixed linear/full attention, shared experts, and MTP weights
add Qwen3.5 architecture tests for dense passthrough and MoE linear merges

Testing

pytest tests/test_qwen35_architecture.py -q
pytest -q
verified official full-precision Qwen3.5 dense/MoE model index coverage with zero missing tensors

Note

Medium Risk
Adds a new architecture mapping that drives weight discovery for Qwen3.5 models (dense, MoE, multimodal), so mistakes could cause missing/extra tensors or incorrect merges for these checkpoints, but the change is largely additive and gated by architecture name checks.

Overview
Adds first-class Qwen3.5 architecture support by routing arch_info_for_config to a new qwen35_architecture_for_config resolver when the config architectures[0] matches known Qwen3.5 dense/MoE names.

Introduces mergekit/architecture/qwen35.py, defining module weight layouts for the Qwen3.5 text decoder (including mixed linear/full attention and optional attention biases), MoE/shared-expert variants, optional MTP blocks, and the multimodal vision tower; it also sets multimodal tagalong files and correct vocab-size config keys.

Adds tests/test_qwen35_architecture.py to assert full coverage of transformers state_dict keys for dense and MoE configs and to smoke-test passthrough (dense) and linear (MoE) merges end-to-end.

^{Reviewed by Cursor Bugbot for commit a812d01. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-07T06:40:19Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

ZhangYiqun018 · 2026-05-07T06:42:54Z

I have read the CLA Document and I hereby sign the CLA

ZhangYiqun018 · 2026-05-07T06:43:39Z

recheck

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1296196de9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T06:44:14Z

+            if getattr(_text_config(config), "attention_bias", False):
+                res.extend(
+                    WeightInfo(name=f"{prefix}.self_attn.{name}", optional=True)
+                    for name in ("q_proj.bias", "k_proj.bias", "v_proj.bias")


Include the full-attention output bias

When attention_bias is enabled, Qwen3.5 full-attention layers also instantiate self_attn.o_proj.bias (o_proj is constructed with bias=config.attention_bias in the Transformers Qwen3.5 implementation), but this architecture only enumerates q/k/v biases. For any Qwen3.5 dense or MoE checkpoint with attention_bias=True, mergekit will never plan or write the existing output-projection bias, so merged outputs silently drop that tensor.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 69f901f. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a812d01a66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T11:04:05Z

+            WeightInfo(name=f"{prefix}.self_attn.q_proj.weight", optional=True),
+            WeightInfo(name=f"{prefix}.self_attn.k_proj.weight", optional=True),
+            WeightInfo(name=f"{prefix}.self_attn.v_proj.weight", optional=True),
+            WeightInfo(name=f"{prefix}.self_attn.o_proj.weight", optional=True),


Include attention biases for MTP layers

When attention_bias=True and mtp_num_hidden_layers is enabled, the MTP full-attention block uses the same q/k/v/o projections as Qwen3.5 decoder attention, so checkpoints contain mtp.layers.N.self_attn.{q,k,v,o}_proj.bias tensors. The main decoder path now enumerates those biases, but the MTP architecture still only lists the weights and norms here, so merges will silently omit the MTP attention biases from the output checkpoint.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

cursor Bot reviewed May 7, 2026

View reviewed changes

Comment thread mergekit/architecture/qwen35.py Outdated

Add Qwen3.5 architecture support

69f901f

ZhangYiqun018 force-pushed the qwen35-architecture-support branch from 18c79d1 to 69f901f Compare May 7, 2026 07:14

cursor Bot reviewed May 7, 2026

View reviewed changes

Comment thread mergekit/architecture/qwen35.py

ZhangYiqun018 added 3 commits May 7, 2026 15:24

Deduplicate Qwen3.5 MTP module registration

7862bb7

Apply pre-commit formatting

90c20da

Support packed Qwen3.5 MTP experts

a812d01

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 architecture support#686

Add Qwen3.5 architecture support#686
ZhangYiqun018 wants to merge 4 commits into
arcee-ai:mainfrom
ZhangYiqun018:qwen35-architecture-support

ZhangYiqun018 commented May 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

ZhangYiqun018 commented May 7, 2026

Uh oh!

ZhangYiqun018 commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZhangYiqun018 commented May 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZhangYiqun018 commented May 7, 2026

Uh oh!

ZhangYiqun018 commented May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhangYiqun018 commented May 7, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading