FunAudioLLM

All

12 repositories

Fun-ASR
Public
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
audio pytorch speech-recognition speaker-diarization multimodal-large-language-models audio-understanding audio-language-model fun-asr
Python
•
Apache License 2.0
•44•629•40•0•Updated Jan 5, 2026Jan 5, 2026
CosyVoice
Public
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
python text-to-speech japanese chatbot multi-lingual tts english chinese korean cantonese
Python
•
Apache License 2.0
•2.1k•19k•853•14•Updated Jan 5, 2026Jan 5, 2026
FunAudioLLM.github.io
Public
HTML
•
MIT License
•10•57•0•1•Updated Dec 31, 2025Dec 31, 2025
SenseVoice
Public
Multilingual Voice Understanding Model
multilingual python ai pytorch speech-recognition speech-to-text asr cross-lingual speech-emotion-recognition audio-event-classification
Python
•
Other
•677•7.3k•163•3•Updated Dec 30, 2025Dec 30, 2025
Fun-Audio-Chat
Public
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
Python
•
Apache License 2.0
•56•537•6•1•Updated Dec 25, 2025Dec 25, 2025
FunResearch
Public
This repository is maintained by the Speech Team at Alibaba’s Tongyi Lab, serving as an open-source platform for our cutting-edge research in speech, audio, NLP technologies. We believe in accelerating scientific progress through transparent collaboration, and invite the global research community to explore, reproduce, and build upon our work.
Python
•
Apache License 2.0
•1•13•0•0•Updated Dec 20, 2025Dec 20, 2025
ThinkSound
Public
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
tta video-to-audio text-to-audio foley-sound-synthesis aigc-audio text-video-to-audio
Python
•65•1.1k•30•1•Updated Nov 25, 2025Nov 25, 2025
CV3-Eval
Public
Python
•
Apache License 2.0
•14•166•6•0•Updated Aug 25, 2025Aug 25, 2025
MME-Emotion
Public
Official repository for the paper “MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models”
Python
•
MIT License
•2•18•1•0•Updated Aug 19, 2025Aug 19, 2025
OmniAudio
Public
Python
•3•7•0•0•Updated May 21, 2025May 21, 2025
FunMusic
Public
A fundamental toolkit designed for music, song, and audio generation
pytorch music-generation audio-processing audio-generation
Python
•
Apache License 2.0
•131•1.3k•22•3•Updated May 20, 2025May 20, 2025
FunAudioLLM-APP
Public
Python
•
MIT License
•74•375•8•0•Updated Jul 22, 2024Jul 22, 2024