Cluster Computing: An Advanced Form of Distributed Computing

The term “cluster” refers to the connection of computers or servers to each other over a network to form a larger “computer”, which is based on the distributed computing architecture. Such a computing cluster is usually made up of standardized servers, workstations, or even consumer-grade PCs, linked to each other over LAN or WAN. The deployment of such a cluster can improve the performance and availability of individual computers; what’s more, clusters generally offer a better return on investment than a large-scale supercomputer that boasts the same performance. To use the “TOP500” list of the world’s supercomputers as an example, over half of them employ some sort of cluster system—the best evidence of the viability of cluster computing.

Glossary:
《What is Distributed Computing?》
《What is Computing Cluster?》
《What is LAN?》

Strength in Numbers: Why More is Better for Supercomputers

The reason for this is simple. “Divide and conquer” is not just a cliché—modern computing is built on this adage. Earlier computers ran on a single processor. A task, in the form of a “command”, comprises of a series of calculations. The processor would wade through each calculation, one at a time, until it had completed a “command”; and then, it would move on to the next one.

Obviously, this method had its limitations. The processor’s performance per watt of power and clock rate restricted how fast the tasks could be completed. You could upgrade the processing speed and data transfer rates, but that would eventually hit a ceiling. This gave rise to the concept of “parallelism”: the idea that different parts of the same task, or a number of interconnected tasks, could be carried out simultaneously by multiple processors.

Comparison between Cluster Computing and Parallel Computing

When it comes down to it, cluster computing is a form of parallelism. Parallelism is effective when you need to simultaneously carry out multiple calculations that are part of the same task. It has the potential to revolutionize the way we work with data; as such, it is the focal point of both private enterprises and esteemed research institutes.

While the terms are sometimes used interchangeably, both parallel computing and distributed computing are extensions of the concept of parallelism. Some would say the minute difference between these two methods is parallel computing involves multiple processors sharing the same resources within one computer, while distributed computing (including cluster computing) is more about using multiple computers in tandem. This is done because some computing tasks may require each node to operate more independently, while other tasks may require better interconnectivity between the nodes. Regardless, parallelism in computing has already become a deeply integrated part of our lives. We have all benefited from this breakthrough in computer science.

Glossary:
《What is Parallel Computing?》
《What is Node?》

While the two terms are sometimes used interchangeably, some differentiate between distributed computing and parallel computing by the amount of resources that are shared between the processors. A greater or lesser amount of shared resources, such as the computer memory, may be more suitable for some specific tasks.

Since cluster computing is a kind of distributed computing, we can start by citing a classic example of distributed computing. Every time you enter a query into your web browser, the task is actually being distributed to different nodes in different locations. The nodes conduct their search independently; no communication between them is necessary. The results are then aggregated and returned to the user device. This is different from parallel computing, which usually requires a lot of data to be transferred between the nodes throughout the process. Multicore systems and the use of GPUs to support CPUs are common examples of parallel computing.《Glossary: What is GPU?》

Under the umbrella term of distributed computing, there is cluster computing, peer-to-peer computing, grid computing, and other more esoteric methods. Our focus today, however, is cluster computing.《Glossary: What is Grid Computing?》

Distributed computing can be seen as the umbrella term that encompasses other forms of parallelism, including cluster computing, peer-to-peer computing, and grid computing. All of them offer varying levels of availability and reliability. You should select the method that best matches the computing task at hand.

Cluster computing, and other types of distributed computing that allocate tasks to a large number of smaller computer systems, are based on a common principle. In multi-processor systems that draw from the same memory pool, scalability can become an issue as the effective bandwidth of the memory struggles to keep up with the growing number of processors. What’s more, the latency that naturally exists between processors impedes the scalability of the system. In other words, a system that shares a large amount of computing resources between processors runs the risk of adding more and more processors without effectively improving performance. Not only is this not cost-effective, it also offers a substantially poorer return on investments.《Glossary: What is Scalability?》

Besides performance, higher availability and reliability are also some advantages that a cluster computing system has over a single system. It’s like the old adage: “Don’t put all your eggs in one basket.” Even the best enterprise-grade hardware may suffer from faulty software, such as subpar device drivers. Cluster computing offers a substantial edge in terms of risk management.《Glossary: What is High Availability?》

PCs and LAN: The Twin Pillars of Cluster Computing

The concept of cluster computing was first conceived in the sixties. The main impetus for this concept was the fact that no single computer could shoulder all the computing tasks simultaneously, nor back up all the data generated. However, by the time the eighties rolled around, the rapid development of personal computers and LAN, coupled with the advent of powerful and versatile processors, LAN standards, standardized message transfer application interfaces, high density servers, and open-source operating systems—all of these factors made clusters composed of multi-processor computing nodes not only viable, but a stepping stone towards high performance computing (HPC), high availability (HA), and load balancing technologies.

Glossary:
《What is HPC?》
《What is Load Balancing?》

Around this time, another concept that is often mentioned alongside cluster computing was developed: grid computing. The two are different in the sense that although both systems are composed of interconnected but independent computers, cluster computing takes advantage of the “parallelism” of homogeneous computers connected over LAN to solve a common problem, whereas grid computing is more about the large-scale sharing of resources over a network, and the dynamic integration of separate computers or clusters to improve overall efficiency.

High Availability, Load Balancing, and High Performance

By some accounts, the first cluster system designed for business use was the Burroughs B5700, which was introduced in the middle of the 1960s. It was composed of four computers, each housing a single processor or dual processors, clustered around a shared disk system. This helped with load balancing. What’s more, the computers could be powered down or restarted without interrupting the computing process. Future cluster systems for business use went a step further: they supported parallel computing and file sharing systems, pushing cluster computing a step closer to the realm of supercomputers.

Based on the characteristics of cluster computing systems, they can be categorized as High Availability Clusters, Load Balancing Clusters, or High Performance Computing Clusters. Which type you choose depends greatly on the workload you wish to handle.

The cluster environment may vary greatly in its complexity. A simple dual node system may comprise of just two interconnected computers. Clusters may be used for business purposes, or they may take on the data-intensive computing workloads common in scientific research. Based on these characteristics, clusters can be categorized as High Availability Clusters, Load Balancing Clusters, or High Performance Computing Clusters. As the names imply, different types of clusters offer different benefits.

● High Availability Clusters

● Load Balancing Clusters

As with many things in life, the problem is not that there are not enough resources, but that distribution is unfair. The even distribution of workloads within a cluster is important. A device known as a load balancer is used to distribute the workloads to different nodes. For example, when you search for something on your web browser, your query is actually being distributed to different nodes, which considerably accelerates the search. Load balancing techniques differ between applications; for example, High Availability Clusters and Load Balancing Clusters usually employ the same load balancing methods, such as those provided by the famous Linux Virtual Server (LVS).

● High Performance Computing Clusters

In the nineties, a group of consumer-grade personal computers were linked together over LAN to create the first Beowulf cluster—the pioneer of a High Performance Computing (HPC) Cluster built from inexpensive hardware. Such clusters boast superior parallel computing capabilities, making them highly recommended for scientific research. The massive amount of data generated by the nodes are transferred to each other through the highly efficient, blazing-fast Message Passing Interface (MPI). Precisely how the MPI automatically detects the types of nodes within the cluster, how the network topology is paired with the infrastructure of the computing node, and how applications are optimized according to the bandwidth and latency of the overall environment—all these questions must be accounted for before an HPC Cluster can be assembled.

Learn More:
《Dive Deeper into HPC: Read the Insightful Tech Guide by GIGABYTE》

GIGABYTE Servers Recommended for Cluster Computing

It should be noted that a system that employs cluster computing is more than a sum of its parts. Just like the data centers and server farms operated by big enterprises, the cluster computing system is supported by regular repair and maintenance, a comprehensive distributed file system, and storage structure on the back end.

Glossary:
《What is Data Center?》
《What is Server Farm?》

GIGABYTE Technology, an industry leader in high-performance servers, has its finger on the pulse of cutting-edge technology, as well as the latest developments in various vertical markets. GIGABYTE offers a full range of server solutions that can be deployed in different nodes and cluster systems, giving GIGABYTE customers a variety of flexible options to choose from. GIGABYTE can also provide consultation and services for customers who need to manage a massive number of nodes, and who engage in work such as scaling up or scaling down the system, installing new operation systems, or rolling out new applications. GIGABYTE can help IT managers stay on top of their cluster computing systems and keep everything running smoothly.

Glossary:
《What is Scale Up?》
《What is IT?》

GIGABYTE has a full range of server solutions that are highly recommended for cluster computing. H-Series High Density Servers and G-Series GPU Servers are good for control and computing nodes; R-Series Rack Servers are ideal for business-critical workloads; S-Series Storage Servers can safeguard your data; W-Series Tower Servers / Workstations can be conveniently installed outside of server racks.

● Control Nodes

Control nodes help the user manage the entire cluster. As such, it relies heavily on superb processing power. GIGABYTE’s H-Series High Density Servers and G-Series GPU Servers offer industry-leading, highly dense processor configurations powered by the latest Intel® Xeon® Scalable Processors or AMD EPYC™ processors. There is also massive storage capacity and support for different kinds of GPGPU accelerators.

Learn More:
《More information about GIGABYTE’s High Density Server》
《More information about GIGABYTE’s GPU Server》
《Glossary: What is GPGPU?》

● Computing Nodes

● Business-Critical Workloads and Reliable Connectivity

● File-sharing and Storage Nodes

GIGABYTE’s S-Series Storage Servers can support up to 60 bays, which is enough to fully satisfy your business needs. Virtualization techniques, such as Software Defined Storage (SDS), can help you meet a variety of different performance, capacity, and cost-related requirements.

Learn More:
《More information about GIGABYTE’s Rack Server》
《More information about GIGABYTE’s Storage Server》
《Glossary: What is Software Defined Storage?》

GIGABYTE also offers GIGABYTE Server Management (GSM), a proprietary remote management console (RMC) for multiple servers, which you can download for free from the official GIGABYTE website. GSM can be used with all GIGABYTE servers and supports Windows and Linux. GSM includes a complete range of system management functions, such as GSM Server, a software program that provides real-time remote control of a large cluster of servers; GSM CLI, a command-line interface for remote management; GSM Agent, a software program that retrieves information from each node; GSM Mobile, an app on your mobile device that provides managers with real-time status updates; and GSM Plugin, an application program interface that grants users access to VMware vCenter for real-time monitoring and management of server clusters.

GIGABYTE Technology offers a full range of server solutions that can help you construct a viable and cost-effective cluster computing system. Let GIGABYTE empower your digital transformation and help you create value through high tech cluster computing solutions. As always, we encourage you to reach out to our sales representatives at marketing@gigacomputing.com for consultation.

Learn More:
《How to Build Your Data Center with GIGABYTE? Download Our Free Tech Guide》
《GIGABYTE Tech Guide: The Definition of the Server and Its Fascinating History》