Oracle and AMD expand partnership to launch AI superclusters with advanced GPU technology

Oracle and AMD expand partnership to launch AI superclusters with advanced GPU technology

Oracle and AMD have announced a significant enhancement in their longstanding partnership, aiming to bolster customers’ artificial intelligence (AI) capabilities. As part of this collaboration, Oracle Cloud Infrastructure (OCI) will be a launch partner for the first publicly available AI supercluster powered by AMD Instinct™ MI450 Series GPUs, with an initial deployment of 50,000 GPUs set to begin in the third quarter of 2026 and expanding thereafter, reports 24brussels.

This announcement builds on previous joint efforts to provide AMD Instinct GPU platforms on OCI, starting with AMD Instinct MI300X powered shapes in 2024 and continuing with the widespread availability of OCI Compute featuring AMD Instinct MI355X GPUs in the zettascale OCI Supercluster.

The demand for extensive AI capabilities has surged as next-generation models require more resources than existing clusters can support. To effectively manage these workloads, customers need adaptable, open computing solutions that optimize scale and efficiency. OCI’s new planned AI superclusters will utilize the AMD “Helios” rack design, incorporating AMD Instinct MI450 Series GPUs, next-generation AMD EPYC™ CPUs codenamed “Venice,” and advanced AMD Pensando™ networking technology known as “Vulcano.” Together, this vertically-optimized architecture aims to deliver superior performance, scalability, and energy efficiency for extensive AI training and inference.

“Our customers are building some of the world’s most ambitious AI applications, and that requires robust, scalable, and high-performance infrastructure,” stated Mahesh Thiagarajan, executive vice president of Oracle Cloud Infrastructure. “By bringing together the latest AMD processor innovations with OCI’s secure, flexible platform and advanced networking powered by Oracle Acceleron, customers can push the boundaries with confidence. Through our decade-long collaboration with AMD—from EPYC to AMD Instinct accelerators—we’re continuing to deliver the best price-performance, open, secure, and scalable cloud foundation in partnership with AMD to meet customer needs for this next era of AI.”

Forrest Norrod, executive vice president and general manager of AMD’s Data Center Solutions Business Group, emphasized that “AMD and Oracle continue to set the pace for AI innovation in the cloud. With our AMD Instinct GPUs, EPYC CPUs, and advanced AMD Pensando networking, Oracle customers gain powerful new capabilities for training, fine-tuning, and deploying the next generation of AI. Together, AMD and Oracle are accelerating AI with open, optimized, and secure systems built for massive AI data centers.”

AMD Instinct MI450 Series GPUs Coming to OCI

The AMD Instinct MI450 Series GPU-powered configurations are poised to deliver high-performance and flexible cloud deployment options with extensive open-source support, ideal for advanced language models, generative AI applications, and high-performance computing tasks. Key features include:

  • Breakthrough compute and memory: Optimized for faster results and complex workloads, these GPUs offer up to 432 GB of HBM4 and 20 TB/s of memory bandwidth, allowing models up to 50% larger than previous generations to be trained entirely in memory.
  • AMD optimized “Helios” rack design: Facilitates scale operations while enhancing performance density and energy efficiency through liquid-cooled, 72-GPU racks, optimizing connectivity and throughput.
  • Powerful head node: Designed to enhance cluster utilization and streamline workflows, leveraging next-generation AMD EPYC CPUs, codenamed “Venice,” which also offer enhanced security features.
  • DPU-accelerated converged networking: Enhances data ingestion performance while bolstering security for large-scale AI infrastructures, built on the programmable AMD Pensando DPU technology.
  • Scale-out networking for AI: Promotes ultra-fast distributed training through high-speed and programmable connectivity via AMD Pensando “Vulcano” AI-NICs.
  • Innovative UALink and UALoE fabric: Streamlines workflow expansion and memory management, supporting direct, hardware-coherent networking among GPUs without CPU bottlenecks.
  • Open-source AMD ROCm™ software stack: Simplifies migration of existing AI and HPC workloads, offering an open programming environment with various frameworks and libraries.
  • Advanced partitioning and virtualization: Facilitates secure resource sharing and efficient GPU allocation through granular partitioning and robust multi-tenancy.

Additionally, OCI has announced the general availability of OCI Compute featuring AMD Instinct MI355X GPUs, designed to promote cloud flexibility and value, available in the extensible OCI Supercluster capable of scaling to 131,072 GPUs. Customers can find further information about these offerings here and here.

Leave a Reply

Your email address will not be published.

Don't Miss

California limits early cancellation fees with new consumer protection law

California limits early cancellation fees with new consumer protection law

California’s AB 483 Introduces New Transparency Standards for Consumer Contracts On October
Discord vendor disputes claims of hacking in recent data breach incident

Discord vendor disputes claims of hacking in recent data breach incident

5CA Denies Involvement in Discord Data Breach 5CA, a customer service support