Microsoft Releases Three New Foundational AI Models for Speech and Images

Source: TechCrunch AIPublished: 2 Apr 2026(3mo ago)Added to AI-101: 5 Apr 2026

AI-generated

TLDR

Microsoft has unveiled three new foundational AI models through its Microsoft AI (MAI) division, which was established approximately six months ago. The models demonstrate versatility across multiple modalities, with capabilities spanning voice-to-text transcription, audio generation, and image creation.

Led by CEO Mustafa Suleyman, the simultaneous release of three models represents an aggressive stance in the competitive AI market. The diversity of capabilities—spanning audio and image modalities—suggests Microsoft is attempting comprehensive coverage of high-demand AI applications, positioning itself to compete more directly with established players like OpenAI and others.

Key Takeaways

Microsoft's AI division has released three new foundational models capable of voice transcription, audio generation, and image creation, challenging competitors six months after formation

Read original →