support Flash attention by yuekaizhang · Pull Request #74 · k2-fsa/OmniVoice

yuekaizhang · 2026-04-09T12:38:50Z

Flash Attention Support

Tested with 50 randomly distributed audio samples, randomly grouped into batches of 4.

FlashAttention-2 with packed input (varlen) avoids redundant computation on padding tokens, reducing inference time:

GPU	w/o flash_attn	w/ flash_attn	Speedup
L20	29s	26s	~10%
H20	25s	23s	~8%

Usage:

omnivoice-infer-batch --use_flash_attn --batch_size 4 ...

huangxuegang1129-oss · 2026-04-10T12:28:38Z

demo.py file not modify？？

ZovutVanya · 2026-05-22T07:25:37Z

Does flash attention only help with batches, or single audio too?

yuekaizhang added 2 commits April 9, 2026 19:03

support FA2

25fb2b7

change options name

216dcba