Concepts of Distributed Systems (Part 1)
What ARE distributed systems?
Join the DZone community and get the full member experience.Join For Free
What Are Distributed Systems?
There are lots of different definitions you can find for distributed systems. For example, Wikipedia defines distributed systems as:
"A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. The components interact with one another in order to achieve a common goal."
Similarly, Technopedia defines distributed systems as
"A distributed system is a network that consists of autonomous computers that are connected using a distribution middleware. They help in sharing different resources and capabilities to provide users with a single and integrated coherent network."
Irrespective of which definition you choose, there are a couple of important things to notice in these definitions. First and foremost, a distributed system consists of components or computers that are autonomous. The second aspect is that for any user or program, a distributed system appears to be a single system (coherent, achieve a common goal, etc.). The third aspect is that these autonomous components need to communicate and coordinate with each other in some way or another.
One of the keys to building distributed systems lies in how communication and coordination is established between these autonomous components.
Characteristics of Distributed Systems
There are certain characteristics which are common to distributed systems. We will discuss these characteristics in the following sections.
A distributed system needs to hide a majority of details from the user (end-user or another system). That is, the user of a distributed system is unaware of any differences in the components, (software stack, libraries, etc.), computers (hardware details, operating system, etc.), and/or how they communicate. The user is also unaware of how the different components are organized internally.
A distributed system is generally assumed to be available, even if parts of the system are temporarily unavailable. Users should not be aware that certain parts are unavailable or being fixed or removed, or that other parts are being added to the system.
In general, an important characteristic of a distributed system is the ability to hide the fact that the system consists of physically distributed components and present itself as if it were a single system or computer. A system which accomplishes this is said to provide transparency.
A distributed system can provide different kinds of transparency.
|Transparency Type||Transparency Details|
|Location||Hide where a resource is located|
|Migration||Hide the fact that a resource may be moved/ relocated while in use|
|Replication||Hide the fact that a resource may be replicated|
|Concurrency||Hide that a resource may be shared by multiple users|
|Failure||Hide the fact that resources of the system may fail, recover, be removed or added|
|Data||Hide differences in data formats and representation|
Like any other choice we need to make, there is always a tradeoff associated. Aiming for a high level of transparency can adversely affect performance and the ability to understand a system, among other things. Not all levels of transparency are achievable or sometimes even required for all systems. Certain use cases warrant certain kinds of transparency; however, we will not be covering these in the interest of brevity.
Another important characteristic of distributed systems is the ability to scale. Scalability implies that the system is able to cope with an increased load (number of users, storage, compute, or resources) without degradation in the quality of service it offers. There are many different facets to scaling a distributed system. However, a common theme of accomplishing it is to move away from centralized services, centralized data, and centralized algorithms. Centralized components, whether services, data, or algorithms not only become a single point of failure but also would become bottlenecks when the load on the system exceeds the capacity which they can handle.
In distributed systems, decentralization is the key. Decentralized systems (services, data, or algorithms) have certain characteristics.
- No machine has complete information about the state of the system. In order to make certain decisions, one needs to look at the majority. The majority implies what most of the nodes agree on. Majority can be achieved using strategies like quorum, total order broadcast, or consensus.
- Machines make local decisions based on local information.
- Faults can occur or some machines can fail, but the system as a whole still continues to work. This is often called Resilience. (Faults can be subdivided as hardware faults, which are random and have a weak correlation between them, software faults which manifest under certain conditions and have a strong correlation and human errors).
- There is no implicit assumption of a global clock (Refer to Time and Order in Distributed Systems for details).
There are subtle differences of how you approach scaling a system across geographical locations. Across wide area networks, the latencies are typically three orders of magnitude higher than latencies across local area networks. In a local area network, the network is generally reliable and based on the broadcast. However, across wide area networks, the network is generally unreliable and point to point.
Typically, systems designed to run in local area networks work on a synchronous model, where a client (some system) sends a request and then blocks it until it receives a response from the server (a different system). However, such synchronous mechanisms will not work effectively across geographically distributed systems.
Most of the problems in scaling manifest as performance problems caused by the limited capacity of servers or the network. In general, these problems can be resolved using the following techniques
- Use Asynchronous communication where applicable. Certain applications lend themselves well to asynchronous communication. In such systems, the requestor does not block for the response to arrive. Generally, this is accomplished using some kind of a callback or event handler which triggers when a response is received.
- Replication – In distributed systems, it often makes sense to replicate components. Replication not only increases availability, but it helps to balance the load on the system across multiple components, leading to better performance. In the case of geographical scaling, replicas can be set up so that they are closer to the clients they are serving. The challenges with replication lie in the fact that we need to maintain consistency across multiple copies. If there were no changes made to the copies, replication would be very simple to accomplish.
- Partitioning – Partitioning is a form of decentralization. For example, if the data volumes are too large to fit into a single replica, they may be split into smaller chunks and stored on different machines. Services can be split by function (called Y-axis splitting) and each service can have multiple replicas (sometimes called an m-n topology, where there are m components each having n replicas). Certain applications need partitioning by ranges, using either keys or hash functions which operate on some key. Partitioning and replication typically go hand in hand.
Building distributed systems can seem a formidable task. However, there are design principles which can be used to build reliable and robust distributed systems. Often, issues arise when systems are built using certain fallacies of distributed systems. These fallacies were cataloged by L. Peter Deutsch:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
In this post, we covered what are distributed systems, what are some of their characteristics, and why building distributed systems is a difficult task. I’ve glossed over quite a few things for the sake of brevity. If I was to detail out all the aspects of the things here, the post would become excessively large.
Please share thoughts and comments below!
Published at DZone with permission of Maneesh Chaturvedi. See the original article here.
Opinions expressed by DZone contributors are their own.