Skip to content

Add IBM Granite architecture support#683

Merged
cg123 merged 2 commits into
arcee-ai:mainfrom
Bhavyashah20:feat/add-granite-architecture
May 6, 2026
Merged

Add IBM Granite architecture support#683
cg123 merged 2 commits into
arcee-ai:mainfrom
Bhavyashah20:feat/add-granite-architecture

Conversation

@Bhavyashah20

@Bhavyashah20 Bhavyashah20 commented May 1, 2026

Copy link
Copy Markdown
Contributor

Adds architecture definition for GraniteForCausalLM (IBM Granite 3.x dense models). Without this, mergekit logs "No JSON architecture found for GraniteForCausalLM" and falls back to inference for every Granite model.

Granite uses the same transformer weight layout as Llama (q/k/v/o projections, gate/up/down MLP, input and post-attention layernorms) with model_type "granite". Optional bias entries are included for models released with attention_bias=True or mlp_bias=True to prevent silent tensor loss during merges.

Adds make_picogranite() to test helpers and TestGraniteMerges covering passthrough copy, linear merge, and SLERP.


Note

Medium Risk
Adds a new JSON architecture mapping and forces Pydantic model rebuilds for Configured*Architecture, which could affect architecture loading/validation across model types if the forward-reference resolution changes.

Overview
Adds first-class support for IBM Granite dense models by introducing a granite.json architecture definition (including optional bias tensors and tied lm_head handling) so merges no longer fall back to architecture inference.

Updates mergekit/architecture/base.py to import torch for PretrainedConfig forward-ref resolution and explicitly model_rebuild() the configured architecture Pydantic models.

Extends the test suite with a minimal GraniteForCausalLM fixture (make_picogranite) and new passthrough/linear/SLERP merge coverage for Granite models.

Reviewed by Cursor Bugbot for commit e1a7b40. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@Bhavyashah20

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

Adds architecture definition for GraniteForCausalLM (IBM Granite 3.x dense
models). Without this, mergekit logs "No JSON architecture found for
GraniteForCausalLM" and falls back to inference for every Granite model.

Granite uses the same transformer weight layout as Llama (q/k/v/o projections,
gate/up/down MLP, input and post-attention layernorms) with model_type "granite".
Optional bias entries are included for models released with attention_bias=True
or mlp_bias=True to prevent silent tensor loss during merges.

Adds make_picogranite() to test helpers and TestGraniteMerges covering
passthrough copy, linear merge, and SLERP.
@Bhavyashah20 Bhavyashah20 force-pushed the feat/add-granite-architecture branch from 0ca5233 to fb75d9a Compare May 1, 2026 12:37
Import torch in architecture/base.py so Pydantic can resolve the
torch.dtype forward reference in PretrainedConfig, then call
model_rebuild() on ConfiguredModuleArchitecture and
ConfiguredModelArchitecture to complete type resolution.
@cg123 cg123 merged commit 813142d into arcee-ai:main May 6, 2026
6 checks passed
@cg123

cg123 commented May 6, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the PR!

@github-actions github-actions Bot locked and limited conversation to collaborators May 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants