Add IBM Granite architecture support by Bhavyashah20 · Pull Request #683 · arcee-ai/mergekit

Bhavyashah20 · 2026-05-01T12:33:13Z

Adds architecture definition for GraniteForCausalLM (IBM Granite 3.x dense models). Without this, mergekit logs "No JSON architecture found for GraniteForCausalLM" and falls back to inference for every Granite model.

Granite uses the same transformer weight layout as Llama (q/k/v/o projections, gate/up/down MLP, input and post-attention layernorms) with model_type "granite". Optional bias entries are included for models released with attention_bias=True or mlp_bias=True to prevent silent tensor loss during merges.

Adds make_picogranite() to test helpers and TestGraniteMerges covering passthrough copy, linear merge, and SLERP.

Note

Medium Risk
Adds a new JSON architecture mapping and forces Pydantic model rebuilds for Configured*Architecture, which could affect architecture loading/validation across model types if the forward-reference resolution changes.

Overview
Adds first-class support for IBM Granite dense models by introducing a granite.json architecture definition (including optional bias tensors and tied lm_head handling) so merges no longer fall back to architecture inference.

Updates mergekit/architecture/base.py to import torch for PretrainedConfig forward-ref resolution and explicitly model_rebuild() the configured architecture Pydantic models.

Extends the test suite with a minimal GraniteForCausalLM fixture (make_picogranite) and new passthrough/linear/SLERP merge coverage for Granite models.

^{Reviewed by Cursor Bugbot for commit e1a7b40. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-01T12:33:24Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

Bhavyashah20 · 2026-05-01T12:36:19Z

I have read the CLA Document and I hereby sign the CLA

Adds architecture definition for GraniteForCausalLM (IBM Granite 3.x dense models). Without this, mergekit logs "No JSON architecture found for GraniteForCausalLM" and falls back to inference for every Granite model. Granite uses the same transformer weight layout as Llama (q/k/v/o projections, gate/up/down MLP, input and post-attention layernorms) with model_type "granite". Optional bias entries are included for models released with attention_bias=True or mlp_bias=True to prevent silent tensor loss during merges. Adds make_picogranite() to test helpers and TestGraniteMerges covering passthrough copy, linear merge, and SLERP.

Import torch in architecture/base.py so Pydantic can resolve the torch.dtype forward reference in PretrainedConfig, then call model_rebuild() on ConfiguredModuleArchitecture and ConfiguredModelArchitecture to complete type resolution.

cg123 · 2026-05-06T05:15:22Z

Thanks for the PR!

Bhavyashah20 force-pushed the feat/add-granite-architecture branch from 0ca5233 to fb75d9a Compare May 1, 2026 12:37

cg123 merged commit 813142d into arcee-ai:main May 6, 2026
6 checks passed

github-actions Bot locked and limited conversation to collaborators May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add IBM Granite architecture support#683

Add IBM Granite architecture support#683
cg123 merged 2 commits into
arcee-ai:mainfrom
Bhavyashah20:feat/add-granite-architecture

Bhavyashah20 commented May 1, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Bhavyashah20 commented May 1, 2026

Uh oh!

Uh oh!

cg123 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Bhavyashah20 commented May 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bhavyashah20 commented May 1, 2026

Uh oh!

Uh oh!

cg123 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bhavyashah20 commented May 1, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading