Skip to content

Distributed Neighborhood Attention S2#161

Draft
azrael417 wants to merge 15 commits intomainfrom
tkurth/distributed-neighborhood-attention
Draft

Distributed Neighborhood Attention S2#161
azrael417 wants to merge 15 commits intomainfrom
tkurth/distributed-neighborhood-attention

Conversation

@azrael417
Copy link
Copy Markdown
Collaborator

This MR adds distributed Neighborhood Attention S2 support and fixes some issues in the existing attention kernel.

  • existing serial attention kernel preallocated an output tensor which was too large when using attention based downsampling. This is fixed
  • existing serial attention kernel does not produce the correct v gradient when used in upsampling. We will fix that next
  • this MR adds distributed neighborhood attention along with some new tests for the feature. This kernel does not support up- or downsampling yet

@azrael417 azrael417 requested review from bonevbs March 30, 2026 16:10
@azrael417 azrael417 self-assigned this Mar 30, 2026
@azrael417 azrael417 force-pushed the tkurth/distributed-neighborhood-attention branch 2 times, most recently from 7f72f76 to b18f216 Compare April 1, 2026 05:09
@azrael417 azrael417 force-pushed the tkurth/distributed-neighborhood-attention branch from b75d634 to 4d434d6 Compare April 6, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants