Mindtech Chameleon Synthetic Data Platform

GIGABYTE has partnered with Mindtech Global to democratise the availability of datasets to train neural networks for the implementation of AI systems through a combined platform for synthetic data creation.
Introduction
Mindtech has brought to market a platform to create and manage synthetic data - Chameleon. Such a synthetic data platform works most effectively on a platform with significant compute resources, making the partnership with GIGABYTE ideal. GIGABYTE’s comprehensive platforms for creation and processing of AI make GIGABYTE an ideal partner for Mindtech’s Chameleon platform, enabling the 3 main building blocks of AI systems: the framework, such as Tensorflow, Caffe or Pytorch, the network itself (effectively the algorithm) such as Resnet or Yolo, and Data to train the network.

To obtain the required scale of datasets for training of an accurate AI based solution means that any new entrants into the market must utilize all available data sources. The problem of achieving scale is further compounded by many of the “public” datasets available today that are either not being licensed for commercial use, or having questionable sources. Synthetic data enables the democratisation of AI by allowing anyone to achieve the required scale in their datasets.
Use Case Scenarios
Rapid Prototyping
Synthetic data generation enables datasets, including images and advanced annotations, to be quickly created for initial testing of algorithms and workflows.
Data Drift
Neural networks need to be continuously checked for accuracy, and the training data must be kept up to date and relevant. Only synthetic data can provide images of unreleased products, and rapidly respond to changing market requirements.
Bias Reduction
A key issue for any AI system is ensuring that it does not exhibit unintended bias. Mindtech Chameleon AI tools can help identify and overcome unintentional bias by creating data to balance the dataset.
Challenges of Data Creation
The key challenge for implementation of visual AI systems is in having sufficient, relevant, accurately annotated images. Recent advances in synthetic data creation, with tools like Mindtech’s Chameleon platform, have overcome many of the traditional issues with real data, and offer the capability for data scientists to generate massive quantities of discreet, accurately annotated data for training their networks.

The Chameleon platform creates a virtual 3D world, consisting of a scene, into which the data scientist places the required assets (people, cars, objects, etc.) and defines the series of events (scenarios) he is interested in before “filming” the events to produce the data required for training. Because Chameleon has produced the data, it knows where all the objects are located within the scene and can therefore automatically produce fully accurate, advanced annotations as the images are created, including temporal based annotations.

Creating this data benefits from significant compute resources, even ahead of the training stage. GIGABYTE platforms, which can also host the frameworks for training, are ideal for these tasks. Employing tightly coupled data creation and training increases the overall process efficiency.

The simulator within the Chameleon platform does not only generate the photo-realistic graphics, but also models the behaviours of the assets within it. This is required to ensure semi-autonomy for the virtual world, allowing for scalability. The created synthetic data can augment real data and for many training requirements may replace real data. This 3D world creation and modelling requires significant compute resources and is optimised to ensure maximum utilisation on systems containing significant CPU and GPU resources such as the GIGABYTE G-series servers.
First stage in the process: the creation of a scenario to prepare for synthetic data generation
Mindtech’s Chameleon Benefits & Solutions
The Chameleon platform has several features that ensure it can efficiently use the resources of the GIGABYTE G-series servers.
suitable to Flexibility, Scalability, production capacity
Complete Dataset Solution
The Simulator produces annotated images for the AI tools to train, then kick out results.
suitable to Reliability, Consistency
Ready To Go Datasets
Supply complete data packs, ready for training AI systems that meet training goals.
suitable to User Friendly, Ease of Use& Lower maintenance requirement
Scenario Editor
Simple UI based, drag'n'drop tool to create the exact scenarios and corner cases needed.
suitable to Time saving
Quick To Adapt
Introduce new data for changes in the environment, such as people wearing facial masks or automated vehicles on roads.
suitable to Reduce Expenses, Money Saving, Reduces Cost
Streamlined Process
Automatic generation of advanced annotations that significantly reduces time and cost compared to real data.
1) Scripting engine
In addition to the Chameleon simulator’s fully interactive UI mode, it is designed primarily to be run in a fully scripted (command line/shell) mode which utilises the pre-defined scenario created by the user.
These scripts have the ability to automatically generate large data sets by varying elements, such as the time of day and weather conditions.
Using the Chameleon scripting engine to vary the time of day of a smart-city simulation
2) Smart Behavior Modelling
This allows the AI system designer to insert an asset into a scenario, without having to be concerned over what the asset is going to do. The asset will have certain characteristics and exhibit appropriate behavior.
For example, in a smart-city environment, we expect cars and pedestrians to behave in a certain fashion at intersections. In a smart-vision retail setting, a customer in a supermarket would be expected to walk up and down aisles in a semi-orderly fashion, pausing occasionally to select items from shelves. Of course, we will still need to introduce corner case behavior such as jaywalking, children running in a supermarket, and so on. However, we need the basic behavior accurately modelled to ensure that large-scale, realistic data generation is feasible.
Mindtech's Chameleon Platform, automatic behaviour models for Smart-City, Smart-Vision and Smart-Machine
3) Automatic Asset Insertion
New assets can automatically be inserted into a scenario (asset source), and automatically removed (asset sink), when no longer required. This automatic insertion of random assets enables the creation of an unlimited amount of training data, covering far more scenarios than a data scientist could manually create. An example in a smart city application would be the automatic insertion of pedestrians crossing roads.
During simulator run time, new assets can be automatically inserted into the scene creating a large and varied training data set
4) Multi-camera Implementation
The simulator has been created to enable multiple cameras to be processed simultaneously. This allows full freedom to create multi-camera setups that are typical in most environments including airports, retail locations, and cities. This ability to generate perfectly synchronised data from multiple cameras simultaneously requires significant compute resources for which GIGABYTE G-series servers are ideal.
Multi-camera implementation provides significant benefits for AI system designers, at the expense of compute requirements
5) Pixel Perfect Fully Annotated Images
A data scientist requires the highest quality of annotation to create the most accurate AI systems. Advanced annotations can allow innovation in the functionality that is achievable by AI systems. Annotations for images, can be basic such as labels or 2D bounding boxes or advanced such as range data, 3D bounding boxes or instance semantic segmentation, all of which the Chameleon Platform delivers. Generating the full dataset required for each image, annotation, mask, meta data is highly compute intensive and requires the performance of systems such as GIGABYTE G-series servers.
6) Optimization
The platform’s simulator was designed from the ground up to be optimized for multi-threaded, multi-core systems, allowing for fast creation of high quality annotated AI training data. The simulator itself will use different threads/cores available from the server for tasks such as annotation creation, bounding box implementation, etc.
Mindtech Chameleon Simulator is designed to make maximum use of available CPU threads and cores
7) Containerization
Each instance of the simulator can be run inside a container (typically Docker, though the Chameleon architecture has been designed with any form of generalized containerization in mind).

Each container includes a full instance of the Chameleon Simulator; this allows for multiple simultaneous runs, ensuring the very large quantity of images and data required for training AI neural networks can be rapidly produced. The utilization of such a large dataset can be very important in limiting potential for unintended bias.

This use of containers by the Chameleon Platform maps ideally to the GIGABYTE server platforms, allowing the GIGABYTE platforms to be fully exploited, to produce the AI training data for the neural networks in a fast, efficient manner.
The Mindtech Chameleon Simulator can take advantage of containers and their orchestration systems to parallelise the resource heavy data creation task required of synthetic data generation
Summary
Mindtech Chameleon AI Training Platform provides a complete solution for synthetic data creation and management. The ability to create photo-realistic, annotated data in a discreet manner is a huge benefit to AI data scientists.

Whilst the Mindtech Chameleon tool is efficiently written, and designed to run on stock hardware, generation of the very large data sets is very compute intensive. GIGABYTE G-series servers are specifically designed to cope with these intensive workloads, offering a balance of CPU and GPU resources that allows the Mindtech tools and their users the freedom to create the required datasets without concerns over system performance.

Achieving scale of datasets is critical to accurate, balanced AI systems. The combination of the Mindtech Chameleon platform and GIGABYTE servers enables anyone to create the required scale of datasets for a successful AI implementation.
GIGABYTE Servers as the Hardware Base of Synthetic Data Platform
1/1
Synthetic Data Platform
G241-G40 (rev. 100)
2U 4 x GPU Server
Related Technologies
Machine Learning
Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on models and inference instead. It is seen as a subset of artificial intelligence.
Artificial Intelligence
Artificial Intelligence (AI) is a broad branch of computer science. The goal of AI is to create machines that can function intelligently and independently, and that can work and react the same way as humans. To build these abilities, machines and the software & applications that enable them need to derive their intelligence in the same way that humans do – by retaining information and becoming smarter over time. AI is not a new concept – the idea has been in discussion since the 1950s – but it has only become technically feasible to develop and deploy into the real world relatively recently due to advances in technology – such as our ability to now collect and store huge amounts of data that are required for machine learning, and also the rapid increases in processing speeds and computing capabilities which make it possible to process the data collected to train a machine / application and make it "smarter".
Inference Engine
In the field of Artificial Intelligence, inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information. The inference engine applies logical rules to the knowledge base and deduced new knowledge typically represented as IF-THEN rules.