Support more models and flash attention 2#3

Open

yuyijiong wants to merge 3 commits intoVITA-Group:mainfrom

yuyijiong commented May 7, 2024

Support more models such as mistral, gemma, qwen2
Support flash attention
Truncate sequence length to under 4k when calculate outliers to avoid OOM

yuyijiong added 3 commits

May 7, 2024 16:45


          Update setup.py

34fb69b


          Create mspoe_models

0ec4736


          Rename mspoe_models to mspoe_models.py

cc12454

ZackZikaiXiao commented Jul 10, 2024

Good job, bugs of casual mask shape are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet