Banner ImageMobile Banner Image

AMD Instinct™ MI350 Series Platform

Bringing Exascale-Class Technologies to Mainstream HPC & AI

AMD Instinct™ MI350 Series GPUs: Leadership Performance, Cost Efficient, Fully Open-Source

The AMD Instinct™ MI350 Series GPUs, launched in June 2025, represent a significant leap forward in data center computing, designed to accelerate generative AI and high-performance computing (HPC) workloads. Built on the cutting-edge 4th Gen AMD CDNA™ architecture and fabricated using TSMC's 3nm process, these GPUs deliver exceptional performance and energy efficiency for training massive AI models, high-speed inference, and complex HPC tasks like scientific simulations and data processing. Featuring 288GB of HBM3E memory and up to 8TB/s bandwidth, the MI350X and MI355X GPUs offer up to 4x generational AI compute improvement and a remarkable 35x boost in inference performance, positioning them as formidable competitors in the AI and HPC markets.

Optimize Next Gen Innovation with ROCm Software

The AMD ROCm™ 7.0 software stack is a key differentiator that enables high-performance AI and HPC development with minimal code changes. Instinct MI350 Series GPUs are fully optimized for leading frameworks such as PyTorch, TensorFlow, JAX, ONNX Runtime, Triton, and vLLM, and offer Day 0 support for popular models through automatic kernel generation and continuous validation. The ROCm 7.0 platform’ unique combination of DeepEP pipelining, SGL cross-PD scheduling, and PD KV-cache transfers deliver significant advantages. AMD is a founding member of the PyTorch Foundation and actively contributes to OpenXLA and UEC, reinforcing our long-term commitment to open-source AI. With AMD Infinity Hub, users gain access to deployment-ready containers that simplify onboarding and accelerate time to value. MI350 Series GPUs are purpose-built for scalable inference and training, with their elastic scaling capabilities and vendor-agnostic optimization enhanced by full Kubernetes integration enabled by the AMD GPU Operator.
Content Image
AMD Instinct

MI355X GPU

MI350X GPU

Model

MI325X GPU

TSMC's N3P / TSMC's N6

Process Technology (XCD/IOD)

TSMC's N5 / TSMC's 6
AMD CDNA4

GPU Architecture

AMD CDNA3
256

GPU Compute Units

304
16,384

Stream Processors

19,456
185 Billion

Transistor Count

153 Billion
10.1 PFLOPs9.2 PFLOPs

MXFP4/MXFP6

N/A

5.0 POPs4.6 POPs

INT8

2.6 POPs
78.6 TFLOPS72.1 TFLOPS

FP64(Vector)

81.7 TFLOPS
5.0 PFLOPS4.6 PFLOPS

FP8

2.6 PFLOPS
2.5 PFLOPS2.3 PFLOPS

BF16

1.3 PFLOPS
288 GB HBM3E

Dedicated Memory Size

256 GB HBM3E
Up to 8.0 TB/sec

Memory Bandwidth

6 TB/s
PCIe Gen5 x16

Bus Interface

PCIe Gen5 x16

Passive & Active

Passive

Cooling

Passive & Active
1400W1000W

Maximum TDP/TBP

1000W
Up to 8 partitions

Virtualization Support

Up to 8 partitions

What's New in ROCm 7.0?

  • Expanded Hardware & Platform Support : ROCm 7 is fully compatible with AMD Instinct™ MI350 Series GPUs (including MXFP6/MXFP4) and extends development to select AMD Radeon™ GPUs and Windows environments, ensuring seamless performance across diverse hardware from cloud to edge.
  • Advanced AI Features & Optimizations : ROCm 7 is targeting large-scale AI and LLM deployments with pre-optimized transformer kernels (OCPFP8/MXFP8/MXFP6/MXFP4), integrated distributed inference via vLLM v1 and SGLang, and enhanced "flash" attention and communication libraries for peak multi-GPU utilization.
  • Optimized Performance* : The ROCm 7 preview delivered up to 3.5× faster AI inference and 3× quicker training than ROCm 6 by leveraging lower-precision data types and advanced kernel fusion to maximize GPU efficiency and reduce memory and I/O load.
  • Enabling Developer Success : With the new ROCm Enterprise AI suite, it is now easier to fine-tune models on domain-specific data and deploy AI services in production, streamline install with a pip install rocm flow, and support advanced optimization features such as model quantization libraries to boost productivity and performance.
  • Expanded Ecosystem & Community Collaboration : ROCm 7 deepens integration with leading AI and HPC models and frameworks, offering day-0 support for PyTorch, TensorFlow, JAX, and ONNX and more, while giving organizations flexibility in model selection with over 2 million pre-trained models. Its broad ecosystem and open-source collaboration ensures stability, compatibility, and readiness for future workloads.

*(MI300-080): Testing by AMD as of May 15, 2025, measuring the inference performance in tokens per second (TPS) of AMD ROCm 6.x software, vLLM 0.3.3 vs. AMD ROCm 7.0 preview version SW, vLLM 0.8.5 on a system with (8) AMD Instinct MI300X GPUs running Llama 3.1-70B (TP2), Qwen 72B (TP2), and Deepseek-R1 (FP16) models with batch sizes of 1-256 and sequence lengths of 128-204. Stated performance uplift is expressed as the average TPS over the (3) LLMs tested. Results may vary.

Select GIGABYTE for the AMD Instinct MI350 Series Platform

Feature Icon

Compute Density

Offering industry leading compute density servers in a 8U air cooled  G893 series and a 4U liquid-cooled G4L3 series, servers achieve greater performance/rack.
Feature Icon

High Performance

The custom 8-GPU UBB-based server ensures stable and peak performance from CPUs & GPUs as priority was given to signal integrity and cooling.
Feature Icon

Scale-out

Multiple expansion slots are available to be populated with ethernet or InfiniBand NICs for high-speed communications between interconnected nodes.
Feature Icon

Advanced Cooling

With the availability of server models using direct liquid cooling (DLC), CPUs and GPUs can have heat removed faster and more efficiently with liquid cooling than air.
Feature Icon

Energy Efficiency

Real-time power management, automatic fan speed control, redundant Titanium PSUs, and DLC option ensure the best cooling and power efficiency. Also, DLC option.

AMD Instinct™ MI300 Series GPUs

OverviewGPU & APUSpecifications
Content Image

    Accelerators for the Exascale Era

  • Frontier is the #1 fastest supercomputer in the TOP500 and one of the greenest in the Green500 with AMD EPYC™ processors and AMD Instinct™ GPUs. These technologies are now avaialable in GIGABYTE servers for high performance computing (HPC), AI training & inference, and data intensive workloads.
  • With AMD's data center APU and discrete GPUs, GIGABYTE has created and tailored powerful, passive and liquid-cooled servers to deliver accelerators for the Exascale era. The AMD Instinct™ MI325X and MI300X GPUs are designed for AI training, fine tuning and inference. They are Open Accelerator Modules (OAMs) on a universal baseboard (UBB) housed inside GIGABYTE G-series servers. The AMD Instinct MI300A integrated CPU/GPU accelerated processing unit (APU) targets HPC and AI. It comes in an LGA socketed design with four sockets in GIGABYTE G383 series servers.
  • El Capitan is projected to be the world's most powerful supercomputer capable of performing more than 2 exaflops per second. At the heart of the new machine is the AMD Instinct MI300A APU, designed to overcome prformance bottlenecks from the narrow interfaces between CPU and GPU, programming overhead for managing data, and the need to modify code for GPU generations. The MI300A APU architecture has a chiplet design where the AMD Zen4 CPUs and AMD CDNA™3 GPUs share unified memory. This means that the technology is not only used to support small deployments such as a single server, but it is also able to scale for large computing clusters. The demand for AI and HPC is here, and GIGABYTE has the technologies you need to win.

Applications for AMD Instinct Series

AI Inference

High memory bandwidth, large memory capacity, and low latency between GPUs are ideal for AI inference for their ability to handle large data sets and processing data in batches. This is important for real-time or large-scale inference applications.

Generative AI

8-GPU UBB based servers are ideal for generative AI because of the parallel processing nature of the GPU. Parallel processing is great for massive training data sets and running deep learning models like neural networks, and it speeds up applications like natural language processing and data augmentation.

HPC

Complex problem solving in HPC applications involves simulations, modeling, and data analysis to achieve greater insights. Parallel processing from the GPU is needed, but also there is heavy reliance on the CPU for sequential processing in mathematical computations.

Featured New Products

G4L3-ZX1-LAT4

HPC/AI Server - AMD EPYC 9005/9004 - 4U DP AMD Instinct MI355X DLC

G893-ZX1-AAX3

HPC/AI Server - AMD EPYC 9005/9004 - 8U DP AMD Instinct MI350X

G383-R80-AAP1

HPC/AI Server - AMD Instinct MI300A APU - 3U 8-Bay Gen5 NVMe

G893-ZX1-AAX2

HPC/AI Server - AMD EPYC 9005/9004 - 8U DP AMD Instinct MI325X

G893-ZX1-AAX1

HPC/AI Server - AMD EPYC 9005/9004 - 8U DP AMD Instinct MI300X

G4L3-ZX1-LAX2

HPC/AI Server - AMD EPYC™ 9005/9004 - 4U DP AMD Instinct™ MI325X DLC

G593-SX1-AAX1

HPC/AI Server - 5th/4th Gen Intel® Xeon® - 5U DP AMD Instinct™ MI300X 8-GPU

G593-SX1-LAX1

HPC/AI Server - 5th/4th Gen Intel® Xeon® - 5U DP AMD Instinct™ MI300X 8-GPU DLC
AMD Logo

Resources

Resource Image

GIGAPOD - AI Supercomputing Solution

Resource Image

GIGABYTE Releases Servers to Accelerate AI and LLMs with AMD EPYC™ 9005 Series Processors and AMD Instinct™ MI325X GPUs

Resource Image

AMD EPYC™ 9005 Series Solutions

Resource Image

GIGABYTE Unveils Next-gen HPC & AI Servers with AMD Instinct™ MI300 Series Accelerators

Resource Image
Topic

AI Server and AI PC Solutions for Every AI Application