We’re excited to share RoboBrain-Audio(FLM-Audio), a new 7B spoken dialog chatbot with native full-duplexity. FLM-Audio achieves superior response qualities and chatting experiences while requiring significantly less training data.
Key innovations:
Natural Monologue
Abandons word-level timestamps and innovatively proposes the “Natural Monologue ” mechanism
Preserves the inherent advantages of LLMs in generating coherence and instruction-following.
effectively addresses the context-dependent pronunciation issues of certain words (especially numbers).
Dual Training Paradigm
Training spans two major stages, four sub-stages, simulating ASR, TTS, and interactive dialog tasks.
Post-Training stage equips the model with the basic abilities of “listening” and “speaking”.
Supervised Fine-Tuning (SFT) stage then shapes its dialogue and full-duplex interaction capabilities.
Resource Links:
https://arxiv.org/abs/2509.02521
https://huggingface.co/CofeAI/FLM-Audio
GitHub – cofe-ai/flm-audio: FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego — an omnimo
The model is now open-sourced, and we look forward to your use and feedback.
Comments URL: https://news.ycombinator.com/item?id=45299218
Points: 1
# Comments: 0
Source: news.ycombinator.com