Skip to content

Implement vectorized forward kernel for performance#249

Open
Jatkingmodern wants to merge 1 commit intomapillary:mainfrom
Jatkingmodern:patch-1
Open

Implement vectorized forward kernel for performance#249
Jatkingmodern wants to merge 1 commit intomapillary:mainfrom
Jatkingmodern:patch-1

Conversation

@Jatkingmodern
Copy link

Added a vectorized forward kernel for improved performance using float4 loads/stores. Implemented a launch wrapper to choose between vectorized and scalar kernel based on conditions.

Added a vectorized forward kernel for improved performance using float4 loads/stores. Implemented a launch wrapper to choose between vectorized and scalar kernel based on conditions.
@meta-cla meta-cla bot added the cla signed label Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant