Stability AI Unveils Revolutionary Audio Model: 6-Minute Song Generation Now Possible!

Published 20 hours ago3 minute read
Uche Emeka
Uche Emeka
Stability AI Unveils Revolutionary Audio Model: 6-Minute Song Generation Now Possible!

Stability AI, the innovative company renowned for its Stable Diffusion technology, has unveiled a new family of audio models known as Stability Audio 3.0. This advanced suite includes a top-tier model capable of generating professional-grade music that extends beyond six minutes in length, a significant claim by the company.

The Stability Audio 3.0 family comprises four distinct models: small SFX (459M parameters), small (459M parameters), medium (1.4B parameters), and large (2.7B parameters). The two smaller models, small SFX and small, are specifically designed for on-device sound and music generation, producing content up to two minutes long. Stepping up in capability, both the medium and large models are engineered to create comprehensive musical compositions lasting an impressive 6 minutes and 20 seconds. This capability represents a substantial improvement, more than doubling the maximum generation length offered by Stable Audio 2.0, which was released in 2024.

In a move to foster broader accessibility and innovation, Stability AI is making the small SFX, small, and medium models available with open weights, allowing users to freely utilize and modify them. This open-source approach builds upon previous efforts, such as the 2024 release of Stable Audio Open, which permitted music generation up to 47 seconds. The new 3.0 family marks a considerable leap forward from these earlier open versions.

The most powerful model, the large model, is exclusively accessible through Stability AI's API and self-hosting paid services. Furthermore, companies generating more than $1 million in revenue are required to secure an enterprise license to use this model. This tiered access strategy highlights the company's focus on both community engagement and commercial application.

The landscape of AI-driven music generation is increasingly competitive, with numerous companies like Google and ElevenLabs developing their own models and tooling. However, as evidenced by ongoing legal challenges faced by services such as Suno and Udio, the robust licensing of data and strategic partnerships with music labels are emerging as critical factors for the long-term viability of these platforms. Recognizing this, Stability AI has proactively forged agreements with major industry players, including Warner Music Group and Universal Music Group, to develop models and music-creation tools. The company asserts that its latest collection of audio models is meticulously built on fully licensed data, ensuring legal compliance and ethical development.

Beyond its core model releases, the AI startup is actively developing a new suite of products specifically tailored for professional musicians, though specific details about these features remain undisclosed. To spearhead this initiative, Ethan Kaplan, formerly the chief digital officer at Universal Audio and Fender, has joined Stability AI to lead its professional music offering. This strategic hire aligns with a broader industry trend where several AI companies are bolstering their credibility and market position by recruiting experienced music executives. For instance, Suno recently appointed former Merlin CEO Jeremy Sirota as its chief commercial officer, and ElevenLabs brought on Derek Cournoyer from indie music publisher Kobalt as a strategy lead for its music business.

Loading...
Loading...
Loading...

You may also like...