Introduction
GIGABYTE’s DNN Training Appliance is a well-integrated software and hardware package that combines powerful computing performance together with a user-friendly GUI. In turn, it provides DNN developers with an easy to use environment to conduct dataset management, training jobs management, real time system environment monitoring, and model analysis. The appliance includes powerful hardware and software optimizations that can improve the performance while reducing the time required for DNN training.
Use Case Scenarios
Providing Developers and Data Scientists the Following Benefits
Saves Time
Deep learning can be done faster using DNN Training Appliance vs an Open Source Community solution.
Saves Money
Achieve maximum utilization of your hardware investment with powerful optimization features, so that downtime is minimal.
Ease of Use
Faster startup of a DNN training environment for developers; spend less time and resources on employee training.
Choices & Customization
The standard version is enabled for image classification and object detection, or talk to us about a customized solution for model / application type
Reduces the Complexity of DNN Training Environment Setup and Management
To generate a production grade DNN model, a developer will need to go through many difficult and time consuming steps, including dataset collection, dataset cleansing, dataset labeling, dataset augmentation, dataset format conversion, models selection, model design, hyperparameters tuning, model training, model evaluation, and model format conversion. Each step requires different tools and configurations that require time and effort for preparation, and switching between these tools often requires additional time writing code to convert different formats to use with different tools.
GIGABYTE’s DNN Training Appliance aims to reduce this complexity by providing a complete training and management platform, while incorporating all these processes into an easy to use web-browser based GUI. Users can import, convert and manage their dataset; design, train and evaluate different DNN models; and test inferencing of their trained models. Based on GIGABYTE’s G481-HA1 server, the platform is fully optimized to use the bare metal resources available to deliver improved training performance on cost-efficient hardware.
GIGABYTE’s DNN Training Appliance aims to reduce this complexity by providing a complete training and management platform, while incorporating all these processes into an easy to use web-browser based GUI. Users can import, convert and manage their dataset; design, train and evaluate different DNN models; and test inferencing of their trained models. Based on GIGABYTE’s G481-HA1 server, the platform is fully optimized to use the bare metal resources available to deliver improved training performance on cost-efficient hardware.
Reduces the Time and Improves the Accuracy for Each DNN Training Job
DNN models need to be trained on a large dataset to achieve an acceptable level of accuracy. Depending on the dataset size, this training could take days or even weeks. And in order to adapt to the latest business circumstances or situations (such as new products, new regulations, etc.), the DNN model needs to be periodically retrained through the latest datasets. If running a DNN training job takes too long, it will have a serious impact on an organization’s operations, resource management, and competiveness.
GIGABYTE’s DNN Training Appliance helps to reduce training time by incorporating many different optimization features:GPU memory optimization to accommodate a large amount of training input or to fit a large model into GPU memory, automatic hyperparameter tuning (during a training job) to achieve higher accuracy, and dataset cleaning features to reduce the training time generated by mislabeled or duplicated training data.
GIGABYTE’s DNN Training Appliance helps to reduce training time by incorporating many different optimization features:GPU memory optimization to accommodate a large amount of training input or to fit a large model into GPU memory, automatic hyperparameter tuning (during a training job) to achieve higher accuracy, and dataset cleaning features to reduce the training time generated by mislabeled or duplicated training data.
Cloud IDE & Utilities Interface
Users can easily create a Cloud IDE (based on Jupyterlab) for DNN model development or data preprocessing by attaching their dataset. The Cloud IDE also provides utilities, such as hyperparameter passing, 3rd-party IDE integration (VSCode and PyCharm), tensorboard and GPU monitoring to simplify the training process.
Users can easily create a Cloud IDE (based on Jupyterlab) for DNN model development or data preprocessing by attaching their dataset. The Cloud IDE also provides utilities, such as hyperparameter passing, 3rd-party IDE integration (VSCode and PyCharm), tensorboard and GPU monitoring to simplify the training process.
Model Templates and Optimization Tutorial
Gigabyte’s DNN Training sytem has built-in templates, guiding the user on how to train different types of models (for image classification, object detection, etc.) with various optimization techniques, such as GPU memory optimization and mixed precision training. These templates allow the user to easily choose the dataset, DNN models, and hyperparameter settings needed based on the DNN application type. Thus, the user can easily leverage templates for collaboration.
Real-time Monitoring and Quick Result Verification
Once training starts, it is possible to keep track of the progress in real-time via the training monitoring chart. After each training job is completed, you can quickly verify your DNN model with the Cloud IDE workspace.
Gigabyte’s DNN Training sytem has built-in templates, guiding the user on how to train different types of models (for image classification, object detection, etc.) with various optimization techniques, such as GPU memory optimization and mixed precision training. These templates allow the user to easily choose the dataset, DNN models, and hyperparameter settings needed based on the DNN application type. Thus, the user can easily leverage templates for collaboration.
Real-time Monitoring and Quick Result Verification
Once training starts, it is possible to keep track of the progress in real-time via the training monitoring chart. After each training job is completed, you can quickly verify your DNN model with the Cloud IDE workspace.
Effective Dataset Management Tools
User Friendly File Browser
The platform provides a file browser style management interface. The user can preview image files, delete files, and download files by selecting target files. To upload files, simply drag and drop files from your PC to the dataset.
Dataset Annotation Visualization
The platform supports multiple dataset annotated formats so that the user can preview the annotated dataset on the dataset page. Ex. Bounding box, segmentation images, etc.
The platform supports multiple dataset annotated formats so that the user can preview the annotated dataset on the dataset page. Ex. Bounding box, segmentation images, etc.
System Monitoring and Administration
System Resource Monitoring
GIGABYTE’s DNN Training Appliance features real-time GPU (including GPU utilization, GPU memory usage, and temperature), CPU, Disk, and memory usage monitoring.
GIGABYTE’s DNN Training Appliance features real-time GPU (including GPU utilization, GPU memory usage, and temperature), CPU, Disk, and memory usage monitoring.
Administration Dashboard
GIGABYTE’s DNN Training Appliance features a dashboard for administration, including an audit log, training tasks overview, dataset overview, and user account management.
GIGABYTE’s DNN Training Appliance features a dashboard for administration, including an audit log, training tasks overview, dataset overview, and user account management.
Create and Run Training Jobs with a Template
Optimized Hardware Platform
Single-Root GPU Server
GIGABYTE's DNN Training Appliance is built with G481-HA1, a server optimized for a single cluster DNN training appliance by employing a single root GPU system architecture. Since DNN training requires frequent communication between each GPU in the system, utilizing a single-root architecture (all GPUs can communicate via the same CPU root) helps reduce GPU to GPU latency and decrease DNN training job time.