Log In

AMD is 'Su' Ready for AI

Published 2 days ago5 minute read

This year, AMD’s Advancing AI event was on another level. The company made it clear it’s no longer afraid of NVIDIA. It introduced the new Instinct MI350 Series GPUs, built on the CDNA 4 architecture, promising a fourfold generational improvement in AI compute and a 35x leap in inferencing performance. 

It also launched ROCm 7.0, its open software stack for GPU computing and previewed the upcoming MI400 Series and Helios AI rack infrastructure.

The company said that MI350X and MI355X GPUs feature 288GB of HBM3E memory and offer up to 8TB/s of memory bandwidth. “MI355 delivers 35x higher throughput when running at ultra-low latencies, which is required for some real-time applications like code completion, simultaneous translation, and transcription,” said AMD CEO Lisa Su.

Su said that models like Llama 4 Maverick and DeepSeek R1 have seen triple the tokens per second on the MI355 compared to the previous generation. This leads to faster responses and higher user throughput. “The MI355 offers up to 40% more tokens per dollar compared to NVIDIA B200,” she added. 

Each MI355X platform can deliver up to 161 PFLOPs of FP4 performance using structured sparsity. The series supports both air-cooled (64 GPUs) and direct liquid-cooled (128 GPUs) configurations, offering up to 2.6 exaFLOPs of FP4/FP6 compute.

The Instinct MI400 Series, expected in 2026, will feature up to 432GB of HBM4 memory and 19.6TB/s of bandwidth. It is set to deliver 40 PFLOPs of FP4 and 20 PFLOPs of FP8 performance.

Speaking about the company’s open-source software ROCm, Vamsi Boppana, senior vice president of AMD’s artificial intelligence group, said it now powers some of the largest AI platforms in the world, supporting major models like Llama and DeepSeek from day one, and delivering over 3.5x inference gains in the upcoming ROCm 7 release.

He added that frequent updates, support for FP4 data types, and new algorithms like FAv3 are helping ROCm deliver better performance and push open-source frameworks like vLLM and SGLang ahead of closed-source options.

“With over 1.8 million Hugging Face models running out of the box, industry benchmarks now in play, ROCm is not just catching up—it’s leading the open AI revolution,” he added.

AMD is working with leading AI companies, including Meta, OpenAI, xAI, Oracle, Microsoft, Cohere, HUMAIN, Red Hat, Astera Labs and Marvell. Su said the company expects the market for AI processors to exceed $500 billion by 2028.

The event, which took place in San Jose, California, also saw OpenAI CEO Sam Altman sharing the stage with Su. “We are working closely with AMD on infrastructure for research and production. Our GPT models are running on MI300X in Azure, and we’re deeply engaged in design efforts on the MI400 Series,” Altman said. 

On the other hand, Meta said its Llama 3 and Llama 4 inference workloads are running on MI300X and that it expects further improvements from the MI350 and MI400 Series. 

Oracle Cloud Infrastructure is among the first to adopt the new system, with plans to offer zettascale AI clusters comprising up to 131,072 MI355X GPUs. Microsoft confirmed that proprietary and open-source models are now running in production on Azure using the MI300X. 

Cohere said its Command models use the MI300X for enterprise inference. HUMAIN announced a partnership with AMD to build a scalable and cost-efficient AI platform using AMD’s full compute portfolio.

AMD announced its new open standard rack-scale infrastructure to meet the rising demands of agentic AI workloads, launching solutions that integrate Instinct MI350 GPUs, 5th Gen EPYC CPUs, and Pensando Pollara NICs. 

“We have taken the lead on helping the industry develop open standards, allowing everyone in the ecosystem to innovate and work together to drive AI forward. We utterly reject the notion that one company could have a monopoly on AI or AI innovation,” said Forrest Norrod, AMD’s executive vice president.

The company also previewed Helios, its next-generation rack platform built around the upcoming MI400 GPUs and Venice CPUs. Su said Venice is built on TSMC’s 2-nanometer process, features up to 256 high-performance Zen 6 cores, and delivers 70% more compute performance than AMD’s current-generation leadership CPUs.

“Helios functions like a single, massive compute engine. It connects up to 72 GPUs with 260 terabytes per second of scale-up bandwidth, enabling 2.9 exaflops of FP4 performance,” she said, adding that compared to the competition, it supports 50% more HBM4 memory, memory bandwidth, and scale-out bandwidth. 

AMD’s Venice CPUsbring up to 256 cores and higher memory bandwidth, while Vulcano AI NICs support 800G networking and UALink. “Choosing the right CPU gets the most out of your GPU,” said Norrod. 

Helios uses UALink to connect 72 GPUs as a unified system, offering open, vendor-neutral scale-up performance.

Describing UALink as a key differentiator, Norrod said one of its most important features is that it’s “an open ecosystem” — a protocol that works across systems regardless of the CPU, accelerator, or switch brand.  He added that AMD believes that open interoperability accelerates innovation, protects customer choice, and still delivers leadership performance and efficiency.

As AI workloads grow in complexity and scale, AMD says a unified stack is necessary, combining high-performance GPUs, CPUs, and intelligent networking to support multi-agent systems across industries.

The currently available solution supports up to 128 Instinct MI350 GPUs per rack with up to 36TB of HBM3E memory. The infrastructure is built on Open Compute Project (OCP) standards and Ultra Ethernet Consortium (UEC) compliance, allowing interoperability with existing infrastructure. 

OCI will be among the first to adopt the MI355X-based rack-scale platform. “We will be one of the first to provide the MI355X rack-scale infrastructure using the combined power of EPYC, Instinct, and Pensando,” said Mahesh Thiagarajan, EVP at OCI. 

Besides that, the new Helios rack solution, expected in 2026, brings tighter integration and higher throughput. It includes next-gen MI400 GPUs, offering up to 432GB of HBM4 memory and 40 petaflops of FP4 performance.

Origin:
publisher logo
Analytics India Magazine

Recommended Articles

Loading...

You may also like...