Voxtral is a newly released open-source AI audio model family by the French startup Mistral AI, officially announced on July 15, 2025. It is designed to bring advanced, affordable, and production-ready speech intelligence capabilities to businesses and developers, competing with large closed-source systems from major players by offering more control and lower cost.
Here is the Key Features of Voxtral:
- Open-source and open-weight: Released under the Apache 2.0 license, allowing for wide adoption, customization, and deployment flexibility in cloud, on-premises, or edge environments.
- Multilingual automatic speech recognition (ASR) and understanding: Supports transcription and comprehension in languages including English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more.
- Long context processing: Handles up to 30 minutes of audio transcription and up to 40 minutes of speech understanding or reasoning, thanks to a 32,000-token context window. This enables accurate meeting analysis, multimedia documentation, and complex voice workflows without splitting files.
- Two model variants:
- Voxtral Small: A 24 billion parameter model optimized for production-scale deployments, competitive with ElevenLabs Scribe, GPT-4o-mini, and Gemini 2.5 Flash.
- Voxtral Mini: A smaller 3 billion parameter model suited for local, edge, or resource-limited deployments.
- Voxtral Mini Transcribe: An ultra-efficient, transcription-only API version optimized for cost and latency, claimed to outperform OpenAI Whisper for less than half the price.
- Functionality beyond transcription: Due to its backbone on Mistral Small 3.1 LLM, Voxtral can answer questions from speech, generate summaries, and convert voice commands into real-time actions like API calls or function executions.
- Robust performance: Trained on diverse acoustic profiles, it maintains accuracy in quiet, noisy, broadcast-quality, conference, and field audio settings.
Pricing and Access:
- Developers and businesses can try Voxtral via free API access on Hugging Face or through Mistral’s chatbot, Le Chat.
- API usage starts at $0.001 per minute, making it an affordable solution for various speech intelligence applications.
Strategic Context:
- Voxtral is Mistral’s first entry into the audio AI space, complementing their existing open-source large language models.
- The release follows closely after Mistral’s announcement of Magistral, their first family of reasoning models aimed at improving AI reliability.
- Mistral is positioning itself as a key open-source AI innovator competing with closed AI giants by providing high-quality, transparent, and cost-effective models.
Voxtral represents a significant advancement in open, cost-effective, and highly capable speech AI, empowering enterprises and developers with more control and flexibility in deploying state-of-the-art voice intelligence solutions.