Microsoft Unleashes Three New Foundational AI Models, Intensifying Rivalry

Published 2 hours ago3 minute read
Uche Emeka
Uche Emeka
Microsoft Unleashes Three New Foundational AI Models, Intensifying Rivalry

Microsoft AI, the tech giant’s dedicated research lab, has officially announced the release of three new foundational AI models that are capable of generating text, voice, and images. This strategic move signifies Microsoft’s ongoing commitment to building its proprietary stack of multimodal AI models, positioning itself to compete directly with rival AI laboratories, even as it maintains its significant partnership with OpenAI.

Among the newly released models is MAI-Transcribe-1, a sophisticated speech-to-text transcription model. It boasts the ability to transcribe speech across 25 different languages into text, and according to a company press release, it operates with remarkable efficiency, being 2.5 times faster than Microsoft’s existing Azure Fast offering. Following this is MAI-Voice-1, an advanced audio-generating model. This voice model empowers users to create 60 seconds of audio in just one second and further allows for the development of custom voices. Lastly, MAI-Image-2 is introduced as a video-generating model. Originally made available on MAI Playground, a new software designed for testing large language models, on March 19, MAI-Image-2 now joins the other two models on Microsoft Foundry. MAI-Transcribe-1 and MAI-Voice-1 are also accessible through MAI Playground.

These groundbreaking models were developed by Microsoft’s MAI Superintelligence team, an elite AI research group that was formed and announced in November 2025, and is currently led by Mustafa Suleyman, who serves as the CEO of Microsoft AI. Suleyman articulated the guiding philosophy behind their creations, stating, “At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use.” He also hinted at future developments, adding, “You’ll see more models from us soon in Foundry and directly in Microsoft products and experiences.”

In an increasingly competitive large language model (LLM) market, a key selling point for these new MAI models is their affordability compared to offerings from industry leaders like Google and OpenAI, as highlighted in the company’s blog post. MAI-Transcribe-1 is priced starting at $0.36 per hour. MAI-Voice-1’s pricing begins at $22 per 1 million characters. For MAI-Image-2, costs are structured at $5 for 1 million tokens when providing text input and $33 for 1 million tokens for image output.

Despite this significant push into developing its own models, Mustafa Suleyman reaffirmed Microsoft’s unwavering commitment to its long-standing partnership with OpenAI in a recent interview with VentureBeat. He revealed to The Verge that a recent renegotiation of this partnership was instrumental in allowing Microsoft to aggressively pursue its superintelligence research endeavors. Microsoft has invested more than $13 billion into the AI research lab and integrates OpenAI’s models into its various products through a multi-year partnership. This strategic approach mirrors Microsoft’s stance in the semiconductor industry, where it both manufactures its own chips and procures them from external suppliers, ensuring a diversified and robust technological foundation.

Loading...
Loading...
Loading...

You may also like...