Using Kyutai's low latency audio models on macOS in one command / Jul 2025

I've just taken Kyutai's speech-to-text model for a spin on my Mac laptop, and it's stunningly good. As background, this is what the prolific Laurent Mazare has been hacking on; he has made a ton of contributions to the OCaml community as well, such as ocaml-torch and starred in a very fun Signals to Threads episode on machine learning at Jane Street back in 2020.

You can get the microphone-to-speech running on your Mac in a few commands, assuming you have uv installed (which you should!).

git clone https://github.com/kyutai-labs/delayed-streams-modeling
cd delayed-streams-modeling
uvx --with moshi-mlx python scripts/stt_from_mic_mlx.py

It understands my accent near perfectly; if that isn't a machine learning miracle, I don't know what is! I'm looking forward to trying this out more with our Low power audio transcription with Whisper project over the summer with Josh Millar and Dan Kvit.

# 16th Jul 2025

notes ai audio llm

Anil Madhavapeddy, Professor of Planetary Computing

Using Kyutai's low latency audio models on macOS in one command / Jul 2025

Related News

Low power audio transcription with Whisper / Jun 2025