Microsoft introduces MAI-Voice-1 speech model and MAI-1 Preview language model, expanding its AI capabilities.
Microsoft unveiled its first in-house AI models, MAI-Voice-1 and MAI-1 Preview, marking a significant shift from its previous reliance on OpenAI’s GPT models. MAI-Voice-1 is a highly efficient natural speech generation model capable of producing a minute of audio in under a second using just one GPU. This model is already integrated into features like Copilot Daily and Podcast and is being introduced to Copilot Labs.
The MAI-1 Preview is Microsoft’s foundational language model, now available for public testing on LMArena, a crowdsourced benchmarking platform. While MAI-1 ranks 13th on LMArena’s leaderboard—behind models such as GPT-5 and Gemini 2.5 Pro—it represents Microsoft’s growing presence in the AI space. The model was trained on 15,000 NVIDIA H100 GPUs, a cost-effective approach compared to competitors who use over 100,000 GPUs.
Microsoft AI Chief Mustafa Suleyman shared insights on the company’s training strategy, emphasizing data quality over quantity to make efficient use of computational resources. He revealed that Microsoft is already developing next-generation models supported by some of the largest data centers equipped with cutting-edge Nvidia GB200s chips.
Looking ahead, Microsoft aims to advance AI by orchestrating specialized models tailored to diverse user needs, promising significant value unlocks through continued innovation.
