AMD Instinct™ MI350 Series Platform

Bringing Exascale-Class Technologies to Mainstream HPC & AI

Leadership Performance, Cost Efficient, Fully Open-Source

AMD Instinct™ MI350 Series

The AMD Instinct™ MI350 Series GPUs, launched in June 2025, represent a significant leap forward in data center computing, designed to accelerate generative AI and high-performance computing (HPC) workloads. Built on the cutting-edge 4th Gen AMD CDNA™ architecture and fabricated using TSMC's 3nm process, these GPUs deliver exceptional performance and energy efficiency for training massive AI models, high-speed inference, and complex HPC tasks like scientific simulations and data processing. Featuring 288GB of HBM3E memory and up to 8TB/s bandwidth, the MI350X and MI355X GPUs offer up to 4x generational AI compute improvement and a remarkable 35x boost in inference performance, positioning them as formidable competitors in the AI and HPC markets. To scale-up AI inference workloads, the MI350P GPU is a dual-slot design that easily integrates into servers for efficient compute performance. Complete GPU-based solutions exist for all workloads using AMD Instinct accelerators.

Overview for the PCIe Card or Module

PCIe CEMOAM

Feature	MI350P
Form Factor	Dual-slot PCIe CEM
GPU Architecture	AMD CDNA™ 4
GPU Compute Units	128
INT8 / INT8 (Sparsity)	Supported (w/ sparsity 2:4)
FP8	Supported (w/ sparsity 2:4)
FP16, BF16	Supported (w/ sparsity 2:4)
MXFP4	Supported
FP64	Supported
Dedicated Memory Size	144 GB HBM3E
Memory Bandwidth	4.0 TB/s
Bus Interface	PCIe Gen5 x16
Cooling	Passive
Power Connector	16-pin 12VHPWR
TBP	600W (450W configurable)
Virtualization Support	Up to 4 partitions

AMD Instinct
TSMC N3P / TSMC N6	Process Technology (XCD / IOD)	TSMC N5 / TSMC N6
AMD CDNA4	GPU Architecture	AMD CDNA3
256	GPU Compute Units	304
16,384	Stream Processors	19,456
185 Billion	Transistor Count	153 Billion
10.1 PFLOPS	9.2 PFLOPS	MXFP4 / MXFP6	N/A
5.0 / 10.1 POPS	4.6 / 9.2 POPS	INT8 / INT8 (Sparsity)	2.6 / 5.2 POPS
78.6 TFLOPS	72.1 TFLOPS	FP64 (Vector)	81.7 TFLOPS
5.0 / 10.1 PFLOPS	4.6 / 9.2 PFLOPS	FP8 / OCP-FP8 (Sparsity)	2.6 / 5.2 PFLOPS
2.5 / 5.0 PFLOPS	2.3 / 4.6 PFLOPS	BF16 / BF16 (Sparsity)	1.3 / 2.6 PFLOPS
288 GB HBM3E	Dedicated Memory Size	256 GB HBM3E
8 TB/s	Memory Bandwidth	6 TB/s
PCIe Gen5 x16	Bus Interface	PCIe Gen5 x16
Passive & Liquid	Passive	Cooling	Passive & Liquid
1400W	1000W	Maximum TDP/TBP	1000W
Up to 8 partitions	Virtualization Support	Up to 8 partitions

AMD Instinct™ MI300 Series

OverviewSpecifications

Accelerators for the Exascale Era

Designed for the most demanding workloads, the AMD Instinct MI325X GPU delivers 256GB of memory and 6 TB/s bandwidth, combining exceptional performance with enhanced power efficiency and support for matrix sparsity to optimize AI training and inference.

The world's first unified data center APU, AMD Instinct MI300A, breaks through performance bottlenecks between CPU and GPU, eliminating programming overhead and simplifying data management.

Powered by AMD EPYC™ processors and AMD Instinct™ GPUs and APUs, the world’s fastest supercomputers, El Capitan and Frontier, demonstrate outstanding performance and energy efficiency on both the TOP500 and GREEN500 lists, proving AMD's leadership in HPC and AI acceleration.

GIGABYTE delivers advanced servers built for the Exascale era, featuring the AMD Instinct™ MI325X and MI300X GPUs as Open Accelerator Modules (OAMs) on a universal baseboard (UBB) inside GIGABYTE G-series servers. The AMD Instinct™ MI300A APU, which integrates CPU and GPU into a single package, is available in the GIGABYTE G383 series with a four-LGA-socket configuration. Together, these systems provide exceptional compute density, scalability, and cooling efficiency, empowering enterprises and research institutions to drive innovation in AI and HPC with confidence.

MI325X GPU	MI300X GPU	Model	MI300A APU
OAM module	Form Factor	APU SH5 socket
-	AMD ‘Zen 4’ CPU cores	24
304	GPU Compute Units	228
19,456	Stream Processors	14,592
163.4 TFLOPS	Peak FP64/FP32 Matrix*	122.6 TFLOPS
81.7/163.4 TFLOPS	Peak FP64/FP32 Vector*	61.3/122.6 TFLOPS
1307.4 TFLOPS	Peak FP16/BF16*	980.6 TFLOPS
2614.9 TFLOPS	Peak FP8*	1961.2 TFLOPS
256 GB HBM3E	192 GB HBM3	Dedicated Memory Size	128 GB HBM3
6.0 GHz	5.2 GHz	Memory Clock	5.2 GHz
6 TB/s	5.3 TB/s	Memory Bandwidth	5.3 TB/s
PCIe Gen5 x16	Bus Interface	PCIe Gen5 x16
8	Infinity Fabric™ Links	8
1000W	750W	Maximum TDP/TBP	550W / 760W (Peak)
Up to 8 partitions	Virtualization Support	Up to 3 partitions

Optimize Next Gen Innovation with AMD ROCm™ 7.0

The AMD ROCm™ 7.0 software stack is a key differentiator that enables high-performance AI and HPC development with minimal code changes. AMD Instinct™ MI350 Series GPUs are fully optimized for leading frameworks such as PyTorch, TensorFlow, JAX, ONNX Runtime, Triton, and vLLM, and offer Day 0 support for popular models through automatic kernel generation and continuous validation.

Expanded Hardware & Platform Support : ROCm 7 is fully compatible with AMD Instinct™ MI350 Series GPUs (including MXFP6/MXFP4) and extends development to select AMD Radeon™ GPUs and Windows environments, ensuring seamless performance across diverse hardware from cloud to edge.
Advanced AI Features & Optimizations : ROCm 7 is targeting large-scale AI and LLM deployments with pre-optimized transformer kernels (OCP-FP8/MXFP8/MXFP6/MXFP4), integrated distributed inference via vLLM v1, llm-d, and SGLang, and enhanced "flash" attention and communication libraries for peak multi-GPU utilization.
Optimized Performance : The ROCm 7 preview delivered up to 3.5x faster AI inference and 3x quicker training than ROCm 6 by leveraging lower-precision data types and advanced kernel fusion to maximize GPU efficiency and reduce memory and I/O load.^[1]

Enabling Developer Success : With the new ROCm Enterprise AI suite, it is now easier to fine-tune models on domain-specific data and deploy AI services in production, streamline install with a simple pip install rocm flow, and support advanced optimization features such as model quantization libraries to boost productivity and performance.
Expanded Ecosystem & Community Collaboration : ROCm 7 deepens integration with leading AI and HPC models and frameworks, offering day-0 support for PyTorch, TensorFlow, JAX, and ONNX and more, while giving organizations flexibility in model selection with over 2 million pre-trained models. Its broad ecosystem and open-source collaboration ensures stability, compatibility, and readiness for future workloads.

^[1](MI300-080): Testing by AMD as of May 15, 2025, measuring the inference performance in tokens per second (TPS) of AMD ROCm 6.x software, vLLM 0.3.3 vs. AMD ROCm 7.0 preview version SW, vLLM 0.8.5 on a system with (8) AMD Instinct MI300X GPUs running Llama 3.1-70B (TP2), Qwen 72B (TP2), and Deepseek-R1 (FP16) models with batch sizes of 1-256 and sequence lengths of 128-204. Stated performance uplift is expressed as the average TPS over the (3) LLMs tested. Results may vary.

Select GIGABYTE for the AMD Instinct™ Platform

Compute Dense

Servers built compute dense: UBB GPUs supported in air-cooled 8U and liquid-cooled 4U servers, and PCIe CEM supported in 2U and 4U servers.

High Performance

Custom server design ensures stable, peak performance from top-tier CPUs and GPUs, to deliver the highest possible results.

Scale-out

Servers have multiple expansion slots to be populated with Ethernet or InfiniBand NICs for high-speed communication between connected nodes.

Advanced Cooling

With the availability of server models using direct liquid cooling (DLC), CPUs and GPUs can dissipate heat faster and more efficiently with liquid cooling than with air.

Energy Efficiency

Real-time power management, automatic fan speed control, redundant Titanium PSUs, and the DLC option ensure optimal cooling and power efficiency.

Applications for AMD Instinct™ Solutions

AI Inference

High memory capacity and bandwidth, along with support for lower precision (INT8 / INT4) make Instinct platforms well-suited for AI inference. These features enable efficient handling of large datasets and high-throughput batch processing, which are critical for real-time and large-scale inference applications.

Generative AI

Generative AI requires fast processing of large models, high throughput, and support for long context windows. Instinct MI350 GPUs deliver high-bandwidth memory and massive parallel compute, enabling efficient training and inference, faster token generation, and scalable, high-quality content creation.

Agentic AI

Agentic AI requires continuous reasoning, rapid decision-making, and coordination across multiple models. Instinct MI350 GPUs provide high memory capacity, fast interconnects, and massive parallel compute, enabling low-latency execution and efficient scaling of complex, multi-step agent workflows.

HPC

Complex problem solving in HPC applications involves simulations, modeling, and data analysis to achieve greater insights. Parallel processing from the GPU is needed, but also there is heavy reliance on the CPU for sequential processing in mathematical computations.

Featured New Products

Resources

News

GIGABYTE Releases Servers to Accelerate AI and LLMs with AMD EPYC™ 9005 Series Processors and AMD Instinct™ MI325X GPUs

Đọc thêm

Solution

AMD EPYC™ 9005 Series Solutions

Đọc thêm

News

GIGABYTE Unveils Next-gen HPC & AI Servers with AMD Instinct™ MI300 Series Accelerators

Đọc thêm

Topic

GIGABYTE AI Solutions for Every AI Application

Đọc thêm

AMD Instinct
MI355X GPU	MI350X GPU	Model	MI325X GPU
TSMC N3P / TSMC N6		Process Technology (XCD / IOD)	TSMC N5 / TSMC N6
AMD CDNA4		GPU Architecture	AMD CDNA3
256		GPU Compute Units	304
16,384		Stream Processors	19,456
185 Billion		Transistor Count	153 Billion
10.1 PFLOPS	9.2 PFLOPS	MXFP4 / MXFP6	N/A
5.0 / 10.1 POPS	4.6 / 9.2 POPS	INT8 / INT8 (Sparsity)	2.6 / 5.2 POPS
78.6 TFLOPS	72.1 TFLOPS	FP64 (Vector)	81.7 TFLOPS
5.0 / 10.1 PFLOPS	4.6 / 9.2 PFLOPS	FP8 / OCP-FP8 (Sparsity)	2.6 / 5.2 PFLOPS
2.5 / 5.0 PFLOPS	2.3 / 4.6 PFLOPS	BF16 / BF16 (Sparsity)	1.3 / 2.6 PFLOPS
288 GB HBM3E		Dedicated Memory Size	256 GB HBM3E
8 TB/s		Memory Bandwidth	6 TB/s
PCIe Gen5 x16		Bus Interface	PCIe Gen5 x16
Passive & Liquid	Passive	Cooling	Passive & Liquid
1400W	1000W	Maximum TDP/TBP	1000W
Up to 8 partitions		Virtualization Support	Up to 8 partitions

AMD Instinct™ MI350 Series Platform

Leadership Performance, Cost Efficient, Fully Open-Source

AMD Instinct™ MI350 Series

Overview for the PCIe Card or Module

AMD Instinct

AMD Instinct™ MI300 Series

Accelerators for the Exascale Era

Optimize Next Gen Innovation with AMD ROCm™ 7.0

Select GIGABYTE for the AMD Instinct™ Platform

Compute Dense

High Performance

Scale-out

Advanced Cooling

Energy Efficiency

Applications for AMD Instinct™ Solutions

AI Inference

Generative AI

Agentic AI

HPC

Featured New Products

G4L3-ZX1-LAT4

G893-ZX1-AAX3

G494-ZB0-AAP2

G893-ZX1-AAX2

G893-ZX1-AAX1

G4L3-ZX1-LAX2

G593-SX1-AAX1

G593-SX1-LAX1

Resources

GIGABYTE Releases Servers to Accelerate AI and LLMs with AMD EPYC™ 9005 Series Processors and AMD Instinct™ MI325X GPUs

AMD EPYC™ 9005 Series Solutions

GIGABYTE Unveils Next-gen HPC & AI Servers with AMD Instinct™ MI300 Series Accelerators

GIGABYTE AI Solutions for Every AI Application

You have the idea, we can help make it happen.