Distributed systems: definition, features and basic principles

2025 Author: Angel Austin | [email protected]. Last modified: 2025-01-23 12:19

A distributed system in its simplest definition is a group of computers working together that appear as one to the end user. Machines share a common state, run concurrently, and can operate independently without impacting the uptime of the entire system. The truth is that managing such systems is a complex topic filled with pitfalls.

Overview of the system

The distributed system allows the sharing of resources (including software) connected to the network at the same time.

Examples of system distribution:

Traditional stack. These databases are stored on the file system of one machine. Whenever the user wants to receive information, he directly communicates with this machine. To distribute this database system, you need to run it on multiple PCs at the same time.
Distributed architecture.

Distributed systemallows you to scale horizontally and vertically. For example, the only way to handle more traffic would be to upgrade the hardware that runs the database. This is called vertical scaling. Vertical scaling is good up to a certain limit, after which even the best equipment cannot cope with providing the required traffic.

Scaling horizontally means adding more computers, not upgrading the hardware on one. Vertical scaling increases performance to the latest hardware capabilities in distributed systems. These opportunities are not enough for technology companies with moderate to heavy workloads. The best thing about horizontal scaling is that there are no size limits. When performance degrades, another machine is simply added, which, in principle, can be done indefinitely.

At the corporate level, a distributed control system often involves various steps. In business processes in the most efficient places of the enterprise computer network. For example, in a typical distribution using a three-tier distributed system model, data processing is done on a PC at the user's location, business processing is done on a remote computer, and database access and data processing is done on a completely different computer that provides centralized access for many businesses. processes. Typically, this type of distributed computinguses the client-server interaction model.

Main Tasks

The main tasks of a distributed control system include:

Transparency - Achieve a single system image without hiding location, access, migration, concurrency, failover, relocation, persistence and resource details to users.
Openness - simplifies network setup and changes.
Reliability - Compared to a single control system, it should be reliable, consistent and have a high probability of masking errors.
Performance - Compared to other models, distributed models provide a performance boost.
Scalable - These distributed control systems must be scalable in terms of territory, administration, or size.

The tasks of distribution systems include:

Security is a big issue in a distributed environment, especially when using public networks.
Fault tolerance - can be tough when the model is built with unreliable components.
Coordination and distribution of resources - can be difficult if there are no proper protocols or required policies.

Distributed computing environment

(DCE) is a widely used industry standard supporting such distributed computing. On the Internet, third-party providers offer some generic services,that fit into this model.

Grid computing is a computing model with a distributed architecture of a large number of computers associated with solving a complex problem. In the grid computing model, servers or personal computers perform independent tasks and are loosely connected to each other by the Internet or low-speed networks.

The largest grid computing project is SETI@home, in which individual computer owners volunteer to perform some of their multitasking processing cycles using their computer for the Search for Extraterrestrial Intelligence (SETI) project. This computer problem uses thousands of computers to download and search radio telescope data.

One of the first uses of grid computing was to break cryptographic code by a group now known as distributed.net. This group also describes their model as distributed computing.

Database scaling

Spreading new information from master to slave does not happen instantly. In fact, there is a time window in which you can get outdated information. If this were not the case, write performance would suffer, as distributed systems would have to wait synchronously for data to propagate. They come with a few compromises.

Using a slave database approach, it is possible to scale out read traffic to some extent. There are many options here. But you just need to divide the write traffic into severalservers because it cannot handle it. One way is to use a multi-master replication strategy. There, instead of slaves, there are several main nodes that support reading and writing.

Another method is called sharding. With it, the server is split into several smaller servers, called shards. These shards have different entries, rules are created about which entries go into which shard. It is very important to create such a rule that the data is distributed evenly. A possible approach to this is to define ranges according to some information about the record.

This shard key should be chosen very carefully, as the load is not always equal to the bases of arbitrary columns. The only shard that gets more requests than the others is called a hotspot, and they try to prevent it from forming. Once split, recalibration data becomes incredibly expensive and can result in significant downtime.

Database consensus algorithms

DBs are difficult to implement in distributed security systems because they require each node to negotiate the correct interrupt or commit action. This quality is known as consensus and is a fundamental problem in building a distribution system. Achieving the type of agreement needed for the "commit" problem is simple if the processes involved and the network are completely reliable. However, real systems are subject to a number ofpossible failures of networking processes, lost, corrupted or duplicated messages.

This poses a problem and it is not possible to guarantee that the correct consensus will be reached within a limited period of time on an unreliable network. In practice, there are algorithms that reach consensus rather quickly in an unreliable network. Cassandra actually provides lightweight transactions through the use of the Paxos algorithm for distributed consensus.

Distributed computing is the key to the influx of big data processing that has been used in recent years. It is a method of breaking down a huge task, such as a cumulative 100 billion records, of which no single computer is capable of doing practically anything on its own, into many smaller tasks that can fit into a single machine. The developer breaks his huge task into many smaller ones, executes them on many machines in parallel, collects the data appropriately, then the original problem will be solved.

This approach allows you to scale horizontally - when there is a big task, just add more nodes to the calculation. These tasks have been performed for many years by the MapReduce programming model associated with the implementation for parallel processing and generating big data sets using a distributed algorithm on a cluster.

Currently, MapReduce is somewhat outdated and brings some problems. Other architectures have emerged that address these issues. Namely, Lambda Architecture for distributedflow processing systems. Advances in this area have brought new tools: Kafka Streams, Apache Spark, Apache Storm, Apache Samza.

File storage and replication systems

Distributed file systems can be thought of as distributed data stores. This is the same as the concept - storing and accessing a large amount of data across a cluster of machines that are a single entity. They usually go hand in hand with Distributed Computing.

For example, Yahoo has been known for running HDFS on over 42,000 nodes to store 600 petabytes of data since 2011. Wikipedia defines the difference in that distributed file systems allow file access using the same interfaces and semantics as local files, rather than through a custom API such as Cassandra Query Language (CQL).

Hadoop Distributed File System (HDFS) is a system used for computing over the Hadoop infrastructure. Widespread, it is used to store and replicate large files (GB or TB size) on many machines. Its architecture consists mainly of NameNodes and DataNodes.

NameNodes is responsible for storing metadata about the cluster, such as which node contains file blocks. They act as network coordinators, figuring out where best to store and copy files, keeping track of system he alth. DataNodes simply store files and perform commands such as file replication, new write, andothers.

Unsurprisingly, HDFS is best used with Hadoop for computing as it provides task information awareness. The specified jobs are then run on the nodes that store the data. This allows you to use the location of the data - optimizes calculations and reduces the amount of traffic over the network.

The Interplanetary File System (IPFS) is an exciting new peer-to-peer protocol/network for distributed file system. Using Blockchain technology, it boasts a fully decentralized architecture with no single owner or point of failure.

IPFS offers a naming system (similar to DNS) called IPNS and allows users to easily retrieve information. It stores the file through historical versioning, much like Git does. This allows access to all previous states of the file. It is still going through heavy development (v0.4 at the time of writing) but has already seen projects interested in building it (FileCoin).

Messaging system

Messaging systems provide a central location for storing and distributing messages within a common system. They allow you to separate application logic from direct communication with other systems.

Known scale - LinkedIn's Kafka cluster processed 1 trillion messages per day with peaks of 4.5 million messages per second.

In simple terms, the messaging platform works like this:

Messagepassed from the application that potentially creates it, called a producer, to the platform, and read from multiple applications, called consumers.
If you need to store a certain event in multiple places, such as creating a user for a database, storage, email sending service, then the messaging platform is the cleanest way to distribute that message.

There are several popular top-notch messaging platforms.

RabbitMQ is a message broker that allows you to fine-tune the control of their trajectories using routing rules and other easily configurable parameters. It can be called a "smart" broker because it has a lot of logic and closely monitors the messages that pass through it. Provides options for APs and CPs from CAP.

Kafka is a message broker that is a little less functional as it doesn't keep track of which messages have been read and doesn't allow complex routing logic. It helps achieve amazing performance and represents the biggest promise in this space with the active development of distributed systems by the open-source community and the support of the Confluent team. Kafka is most popular with high-tech companies.

Machine Interaction Applications

This distribution system is a group of computers working together to appear as a separate computer to the end user. These machines are in general condition, workingsimultaneously and can work independently without affecting the uptime of the entire system.

If you consider the database as distributed, only if the nodes interact with each other to coordinate their actions. It is, in this case, something like an application running its internal code on a peer-to-peer network, and is classified as a distributed application.

Examples of such applications:

Known Scale - BitTorrent swarm 193,000 nodes for Game of Thrones episode.
Basic register technology of distributed Blockchain systems.

Distributed ledgers can be thought of as an immutable, application-only database that is replicated, synchronized, and shared across all nodes in the distribution network.

The well-known scale - the Ethereum network - had 4.3 million transactions per day on January 4, 2018. They use the Event Sourcing pattern, which allows you to restore the state of the database at any time.

Blockchain is the current underlying technology used for distributed ledgers and actually marked their beginning. This newest and biggest innovation in the distributed space created the first truly distributed payment protocol, bitcoin.

Blockchain is a distributed ledger with an ordered list of all transactions that have ever taken place on its network. Deals are grouped and stored in blocks. The entire blockchain is essentially a linked list of blocks. Specified Blocksare expensive to create and are tightly coupled to each other through cryptography. Simply put, each block contains a special hash (which starts with X number of zeros) of the contents of the current block (in the form of a Merkle tree) plus the hash of the previous block. This hash requires a lot of CPU power.

Examples of distributed operating systems

System types appear to the user because they are single user systems. They share their memory, disk, and the user has no trouble navigating through the data. The user stores something in his PC and the file is stored in multiple locations i.e. connected computers so that lost data can be easily recovered.

Examples of distributed operating systems:

Windows Server 2003;
Windows Server 2008;
Windows Server 2012;
UbuntuLinux (Apache server).

If any computer boots higher, that is, if many requests are exchanged between individual PCs, this is how load balancing occurs. In this case, the requests are propagated to the neighboring PC. If the network becomes more loaded, then it can be expanded by adding more systems to the network. The network file and folders are synchronized and naming conventions are used so that no errors occur when data is retrieved.

Caching is also used when manipulating data. All computers use the same namespace to name files. Butthe file system is valid for every computer. If there are updates to the file, it is written to one computer and the changes are propagated to all computers, so the file looks the same.

Files are locked during the read/write process, so there is no deadlock between different computers. Sessions also happen, such as reading, writing files in one session and closing the session, and then another user can do the same and so on.

Benefits of using

An operating system designed to make people's daily lives easier. For user benefits and needs, the operating system can be single user or distributed. In a distributed resource system, many computers are connected to each other and share their resources.

Benefits of doing this:

If one PC in such a system is faulty or damaged, then another node or computer will take care of it.
More resources can easily be added.
Resources such as printers can serve multiple computers.

This is a brief about the distribution system, why it is used. Some important things to remember: They are complex and are chosen for scale and price and are harder to work with. These systems are distributed in several storage categories: computing, file and messaging systems, registers, applications. And all this is only very superficial about a complex information system.