邁爾凌MLSteam深度學習訓練解決方案

技嘉的深度學習解決方案結合強大的運算效能與圖形化操作介面，為深度學習工程師提供簡易操作環境，執行數據集管理、深度學習訓練排程管理、即時監控和訓練模型分析。

Download White Paper

Introduction

GIGABYTE’s DNN Training Appliance is a well-integrated software and hardware package that combines powerful computing performance together with a user-friendly GUI. In turn, it provides DNN developers with an easy to use environment to conduct dataset management, training jobs management, real time system environment monitoring, and model analysis. The appliance includes powerful hardware and software optimizations that can improve the performance while reducing the time required for DNN training.

Use Case Scenarios

Smart Health

Convolutional Neural Networks (CNN) can help and optimize routine tasks for medical image analysis and disease detection, such as eye disease and brain MRI segmentation. Non-image analysis can be used, such as in epileptic seizures prediction.

Automated Traffic Enforcement

Object detection and segmentation techniques can be applied to various traffic enforcement tasks, such as license plate recognition, seat belt usage, and driver cell phone usage.

Image Recognition

The DNN Training Appliance can be used to train algorithms for image recognition - such as for people, cars or other objects, which can be used for an intelligent video analytics platform.

Intelligent Banking

Deep learning algorithms and natural lanaguage processing (NLP) can be used by banks operations, such as customer service automation (by chatbots), analyzing contracts, intelligent document search and credit scoring.

Providing Developers and Data Scientists the Following Benefits

Saves Time

Deep learning can be done faster using DNN Training Appliance vs an Open Source Community solution.

suitable to Reduce Expenses, Money Saving, Reduces Cost

Saves Money

Achieve maximum utilization of your hardware investment with powerful optimization features, so that downtime is minimal.

Ease of Use

Faster startup of a DNN training environment for developers; spend less time and resources on employee training.

suitable to Flexibility, Scalability, production capacity

Choices & Customization

The standard version is enabled for image classification and object detection, or talk to us about a customized solution for model / application type

Reduces the Complexity of DNN Training Environment Setup and Management

To generate a production grade DNN model, a developer will need to go through many difficult and time consuming steps, including dataset collection, dataset cleansing, dataset labeling, dataset augmentation, dataset format conversion, models selection, model design, hyperparameters tuning, model training, model evaluation, and model format conversion. Each step requires different tools and configurations that require time and effort for preparation, and switching between these tools often requires additional time writing code to convert different formats to use with different tools.

GIGABYTE’s DNN Training Appliance aims to reduce this complexity by providing a complete training and management platform, while incorporating all these processes into an easy to use web-browser based GUI. Users can import, convert and manage their dataset; design, train and evaluate different DNN models; and test inferencing of their trained models. Based on GIGABYTE’s G481-HA1 server, the platform is fully optimized to use the bare metal resources available to deliver improved training performance on cost-efficient hardware.

DNN Training Appliance Hardware and Software Stack

Reduces the Time and Improves the Accuracy for Each DNN Training Job

DNN models need to be trained on a large dataset to achieve an acceptable level of accuracy. Depending on the dataset size, this training could take days or even weeks. And in order to adapt to the latest business circumstances or situations (such as new products, new regulations, etc.), the DNN model needs to be periodically retrained through the latest datasets. If running a DNN training job takes too long, it will have a serious impact on an organization’s operations, resource management, and competiveness.

GIGABYTE’s DNN Training Appliance helps to reduce training time by incorporating many different optimization features:GPU memory optimization to accommodate a large amount of training input or to fit a large model into GPU memory, automatic hyperparameter tuning (during a training job) to achieve higher accuracy, and dataset cleaning features to reduce the training time generated by mislabeled or duplicated training data.

Project table view

Training jobs view

Cloud IDE & Utilities Interface
Users can easily create a Cloud IDE (based on Jupyterlab) for DNN model development or data preprocessing by attaching their dataset. The Cloud IDE also provides utilities, such as hyperparameter passing, 3rd-party IDE integration (VSCode and PyCharm), tensorboard and GPU monitoring to simplify the training process.

Cloud IDE

Model Templates and Optimization Tutorial
Gigabyte’s DNN Training sytem has built-in templates, guiding the user on how to train different types of models (for image classification, object detection, etc.) with various optimization techniques, such as GPU memory optimization and mixed precision training. These templates allow the user to easily choose the dataset, DNN models, and hyperparameter settings needed based on the DNN application type. Thus, the user can easily leverage templates for collaboration.

Real-time Monitoring and Quick Result Verification
Once training starts, it is possible to keep track of the progress in real-time via the training monitoring chart. After each training job is completed, you can quickly verify your DNN model with the Cloud IDE workspace.

Effective Dataset Management Tools

User Friendly File Browser

The platform provides a file browser style management interface. The user can preview image files, delete files, and download files by selecting target files. To upload files, simply drag and drop files from your PC to the dataset.

Upload files by drag and drop

Dataset Annotation Visualization
The platform supports multiple dataset annotated formats so that the user can preview the annotated dataset on the dataset page. Ex. Bounding box, segmentation images, etc.

System Monitoring and Administration

System Resource Monitoring
GIGABYTE’s DNN Training Appliance features real-time GPU (including GPU utilization, GPU memory usage, and temperature), CPU, Disk, and memory usage monitoring.

Real-time system resource monitoring

Administration Dashboard
GIGABYTE’s DNN Training Appliance features a dashboard for administration, including an audit log, training tasks overview, dataset overview, and user account management.

Create and Run Training Jobs with a Template

Optimized Hardware Platform

Single-Root GPU Server

GIGABYTE's DNN Training Appliance is built with G481-HA1, a server optimized for a single cluster DNN training appliance by employing a single root GPU system architecture. Since DNN training requires frequent communication between each GPU in the system, utilizing a single-root architecture (all GPUs can communicate via the same CPU root) helps reduce GPU to GPU latency and decrease DNN training job time.

Build Your AI Innovations Ever Faster and Simpler

GIGABYTE Servers as the Hardware Base of DNN

1/6

G492-ID0

G492-ID0 (rev. 100)

HPC/AI Server - 3rd Gen Intel^® Xeon^® Scalable - 4U DP HGX™ A100 8-GPU

2/6

G262-ZO0

G262-ZO0 (rev. A00)

HPC/AI Server - AMD EPYC™ 7003/7002 - 2U DP AMD Instinct™ MI250 4-GPU | Application: 人工智慧平台 , 人工智慧訓練伺服器 , 人工智慧推論伺服器 & 高效能運算伺服器

3/6

H262-Z6B

H262-Z6B (rev. 100)

High Density Server - AMD EPYC™ 7002/7001 - 2U 4 Node DP 8-Bay Gen4 NVMe/SATA | Application: 超融合伺服器 & 私有雲/混和雲伺服器

4/6

R282-Z93

R282-Z93 (rev. A00)

Rack Server - AMD EPYC™ 7003/7002 - 2U DP 3 x PCIe Gen4 GPUs | Application: 人工智慧平台 , 人工智慧訓練伺服器 , 人工智慧推論伺服器 , 視覺特效演算 , 高效能運算伺服器 , 網路伺服器 & 私有雲/混和雲伺服器

5/6

G492-PD0

G492-PD0 (rev. 100)

HPC/AI Arm Server - Ampere^® Altra^® Max - 4U UP HGX™ A100 8-GPU

6/6

DNN Training Appliance 3

G482-Z51 (rev. 100)

AMD EPYC 8 x PCIe Gen 4.0 GPU Server

Related Technologies

機器學習

機器學習是什麼？機器學習(Machine Learning) 是電腦系統使用演算法和統計模型來有效執行特定任務的科學研究，無需使用明確的指令，而是依靠模型(models)和推論(inference)。它被視為人工智慧的一個子集。

推論引擎

推論引擎是什麼？在人工智慧領域中，推論引擎(Inference Engine)是系统中的一個组件，它將現有的資訊經過邏輯性規則推論，再應用到新知識領域。推論引擎將邏輯性規則實際應用到知識庫，通常稱之為IF-THEN規則。

人工智慧

人工智慧(AI)是什麼？人工智慧（Artificial Intelligence）是電腦科學的一個廣泛分支。人工智慧的目標是創造出具有智慧功能獨立運行的機器，並且擁有像人類一樣的工作能力及反應。為了達成這些目的，機器、軟體及各種應用程序運用了和人類相同的方法去獲取智慧 - 通過保存記憶資訊並隨著時間的演進變得更聰明。人工智慧不是一個新概念，這個想法自1950年代起就一直開始備受討論，但由於近代電腦技術的進步 ─ 例如我們現在具備了蒐集大量資訊並儲存的能力，得以獲取足夠的數據量，讓現實中得以實現機器學習的開發，加上硬體處理速度和運算能力的快速提升，這使得處理蒐集的數據用於訓練機器/應用程序並使其“更智慧”的目標成真。

加速實現你的科技創新

業務洽詢