AMD Instinct™ MI350 Series GPUs: Leadership Performance, Cost Efficient, Fully Open-Source
Optimize Next Gen Innovation with ROCm Software
AMD Instinct | |||
|---|---|---|---|
MI355X GPU | MI350X GPU | Model | MI325X GPU |
| TSMC's N3P / TSMC's N6 | Process Technology (XCD/IOD) | TSMC's N5 / TSMC's 6 | |
| AMD CDNA4 | GPU Architecture | AMD CDNA3 | |
| 256 | GPU Compute Units | 304 | |
| 16,384 | Stream Processors | 19,456 | |
| 185 Billion | Transistor Count | 153 Billion | |
| 10.1 PFLOPs | 9.2 PFLOPs | MXFP4/MXFP6 |
N/A |
| 5.0 POPs | 4.6 POPs | INT8 | 2.6 POPs |
| 78.6 TFLOPS | 72.1 TFLOPS | FP64(Vector) | 81.7 TFLOPS |
| 5.0 PFLOPS | 4.6 PFLOPS | FP8 | 2.6 PFLOPS |
| 2.5 PFLOPS | 2.3 PFLOPS | BF16 | 1.3 PFLOPS |
| 288 GB HBM3E | Dedicated Memory Size | 256 GB HBM3E | |
| Up to 8.0 TB/sec | Memory Bandwidth | 6 TB/s | |
| PCIe Gen5 x16 | Bus Interface | PCIe Gen5 x16 | |
Passive & Active | Passive | Cooling | Passive & Active |
| 1400W | 1000W | Maximum TDP/TBP | 1000W |
| Up to 8 partitions | Virtualization Support | Up to 8 partitions | |
What's New in ROCm 7.0?
- Expanded Hardware & Platform Support : ROCm 7 is fully compatible with AMD Instinct™ MI350 Series GPUs (including MXFP6/MXFP4) and extends development to select AMD Radeon™ GPUs and Windows environments, ensuring seamless performance across diverse hardware from cloud to edge.
- Advanced AI Features & Optimizations : ROCm 7 is targeting large-scale AI and LLM deployments with pre-optimized transformer kernels (OCPFP8/MXFP8/MXFP6/MXFP4), integrated distributed inference via vLLM v1 and SGLang, and enhanced "flash" attention and communication libraries for peak multi-GPU utilization.
- Optimized Performance* : The ROCm 7 preview delivered up to 3.5× faster AI inference and 3× quicker training than ROCm 6 by leveraging lower-precision data types and advanced kernel fusion to maximize GPU efficiency and reduce memory and I/O load.
- Enabling Developer Success : With the new ROCm Enterprise AI suite, it is now easier to fine-tune models on domain-specific data and deploy AI services in production, streamline install with a pip install rocm flow, and support advanced optimization features such as model quantization libraries to boost productivity and performance.
- Expanded Ecosystem & Community Collaboration : ROCm 7 deepens integration with leading AI and HPC models and frameworks, offering day-0 support for PyTorch, TensorFlow, JAX, and ONNX and more, while giving organizations flexibility in model selection with over 2 million pre-trained models. Its broad ecosystem and open-source collaboration ensures stability, compatibility, and readiness for future workloads.
*(MI300-080): Testing by AMD as of May 15, 2025, measuring the inference performance in tokens per second (TPS) of AMD ROCm 6.x software, vLLM 0.3.3 vs. AMD ROCm 7.0 preview version SW, vLLM 0.8.5 on a system with (8) AMD Instinct MI300X GPUs running Llama 3.1-70B (TP2), Qwen 72B (TP2), and Deepseek-R1 (FP16) models with batch sizes of 1-256 and sequence lengths of 128-204. Stated performance uplift is expressed as the average TPS over the (3) LLMs tested. Results may vary. |
Select GIGABYTE for the AMD Instinct MI350 Series Platform
Compute Density
High Performance
Scale-out
Advanced Cooling
Energy Efficiency
AMD Instinct™ MI300 Series GPUs
Accelerators for the Exascale Era
- Frontier is the #1 fastest supercomputer in the TOP500 and one of the greenest in the Green500 with AMD EPYC™ processors and AMD Instinct™ GPUs. These technologies are now avaialable in GIGABYTE servers for high performance computing (HPC), AI training & inference, and data intensive workloads.
- With AMD's data center APU and discrete GPUs, GIGABYTE has created and tailored powerful, passive and liquid-cooled servers to deliver accelerators for the Exascale era. The AMD Instinct™ MI325X and MI300X GPUs are designed for AI training, fine tuning and inference. They are Open Accelerator Modules (OAMs) on a universal baseboard (UBB) housed inside GIGABYTE G-series servers. The AMD Instinct MI300A integrated CPU/GPU accelerated processing unit (APU) targets HPC and AI. It comes in an LGA socketed design with four sockets in GIGABYTE G383 series servers.
- El Capitan is projected to be the world's most powerful supercomputer capable of performing more than 2 exaflops per second. At the heart of the new machine is the AMD Instinct MI300A APU, designed to overcome prformance bottlenecks from the narrow interfaces between CPU and GPU, programming overhead for managing data, and the need to modify code for GPU generations. The MI300A APU architecture has a chiplet design where the AMD Zen4 CPUs and AMD CDNA™3 GPUs share unified memory. This means that the technology is not only used to support small deployments such as a single server, but it is also able to scale for large computing clusters. The demand for AI and HPC is here, and GIGABYTE has the technologies you need to win.