Posted by u/jacek2023
mistralai/Voxtral-Mini-4B-Realtime-2602 · Hugging Face
Voxtral Mini 4B Realtime 2602 is a **multilingual, realtime speech-transcription model** and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of **<500ms**. It supports **13 languages** and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling. Built with a **natively streaming architecture** and a custom causal audio encoder - it allows configurable transcription delays (240ms to 2.4s), enabling users to balance **latency and accuracy** based on their needs. At a **480ms delay**, it matches the performance of leading offline open-source transcription models, as well as realtime APIs. As a **4B-parameter model**, is optimized for **on-device deployment**, requiring minimal hardware resources. It runs in realtime with on devices minimal hardware with throughput exceeding 12.5 tokens/second.
More from r/LocalLLaMA
Anthropic: "We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." 🚨
I feel personally attacked
Distillation when you do it. Training when we do it.