tl;dr It’s a CoreML/MLX translation of SimulStreaming (2025 SOTA in simultaneous speech transcription), which itself is a combination Simul-Whisper and WhisperStreaming.
I’m currently building an application, and I thought I would open up the backend model code for everyone to use.
I get ~15x speed increase on my M2 Macbook Pro compared to the original pytorch implementation, and I’m gonna be using the medium model, which has a nice balance between memory usage and accuracy.
The CoreML part is from whisper.cpp, and it only contains the encoder, and the mlx part is from mlx-whisper.
Comments URL: https://news.ycombinator.com/item?id=45620534
Points: 1
# Comments: 0
Source: github.com
 
								
