Skip to content

fix(linear): separate bias loader and fix row-parallel bias handling#81

Open
HaoYuan-Gao wants to merge 1 commit into
Wenyueh:mainfrom
HaoYuan-Gao:fix_linear_bias
Open

fix(linear): separate bias loader and fix row-parallel bias handling#81
HaoYuan-Gao wants to merge 1 commit into
Wenyueh:mainfrom
HaoYuan-Gao:fix_linear_bias

Conversation

@HaoYuan-Gao

@HaoYuan-Gao HaoYuan-Gao commented Jun 16, 2026

Copy link
Copy Markdown

Separate bias loading logic from weight loading logic for parallel linear layers. Weight and bias may have different sharding semantics under tensor parallelism:

  • ColumnParallelLinear shards both weight and bias along output dimension
  • RowParallelLinear shards weight along input dimension, but keeps bias replicated

Also fix RowParallelLinear forward logic by adding bias after all_reduce, preventing bias from being accumulated multiple times when tp_size > 1.

Related issue: Fixes #80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【bug】RowParallelLinear adds bias multiple times when tp_size > 1

1 participant